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ABSTRACT 

Children learn arithmetic procedures by rote, rather 
than by constructing them with an understanding of numbers. Rote 
learning produces lack of flexiiility, nonsensical errors, and other 
difficulties. Proposed is a theory of conceptual understanding and 
its role in learning and executing arithmetic procedures. The basic 
hypothesis is that principles constrain the possible states of 
affairs, thereby enabling learners to monitor their own performance 
and correct errors. A new knowledge representation is proposed, the 
state constraint. The theory has been implemented in the Heuristic 
Searcher, a computer model that learns arithmetic procedures on the 
basis of general principles encoded as constraints on search states. 
Simulated is: (1) the discovery of a general counting procedure in 
the absence of either instruction or solved examples; (2) flexible 
adaptation of a counting procedure in respose to changed task 
demands; and (3) correction of subtraction errors in the absence of 
external feedback. The theory provides novel answers to several 
questions on conceptual understanding, generates testable predictions 
about human behavior, deals successfully with technical issues, and 
fares well on evaluation criteria. Future work will focus on how 
knowledge and experience interact in procedural learning. Over lio 
references are included. (Author/MNS) 
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Knowledge and Understanding in Human Learning 



Knowledge and Understanding in Human Learning (KUL) is an umbrella term for a loosely connected set 
of activities lead by Stellan Ohisson at the Leaming Research and Development Center, University of 
Pittsburgh. The aim of KUL is to clarify the role of world knowledge in human thinl<ing, reasoning, and 
problem solving. Worid l<nowledge consists of general principles, and contrasts with facts (episodic 
l<nowledge) and with cognitive sl<ills (procedural knowledge). The long-term goal is to answer four 
questions: How are new principles acquired? How are prindples utilized in insightful performance'? How 
are principles utilized in leaming to perform? How can instruction facilitate the acquisition and utilization of 
principled (as opposed to episodic or procedural) knowledge? Different methodologies are used to 
investigate these questions: Psychological experiments, computer simulation, historical studies, 
semantic, logical, and mathematical analyses, instructional intervention studies, etc. A list of KUL reports 
appear at the back of this report. 
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Abstract 

School children learn arithmetic procedures by rote, rather than by constructing them on the basis of 
their understanding of numbers. Rote learning produces lack of flexibility, nonsensical errors, and other 
difficulties in learning. Mathematics educators have proposed that if arithmetic procedures were 
constructed under the influence of conceptual understanding of the principles of arithmetic, then 
procedure acquisition would not suffer from these difficulties. However, little effort has been investigated 
in conceptual analysis of this hypottiesis. or in proving its viability. We propose a theory of conceptual 
understanding and its role in the learning and execution of arithmetic procedures. The basic hypothesis 
of the theory Is that principles constrain the possible states of affairs, and thereby enable the learner to 
monitor his/her .own performance and to correct his/her errors. We propose a new knowledge 
representation, the state constraint, which captures this view of principled knowledge. The state 
constraint theory has been implemented in tfie Heuristic Searcher (HS). a computer model that learns 
arithmetic procedures on tiie basis of general principles encoded as constraints on search states. We 
have simulated (a) the discovery of a correct and general counting procedure In the absence of either 
instruction or solved examples, (b) flexible adaptation of an already learned counting procedure in 
response to changes in the task demands, and (c) ttie correction of errors in multi-column subtraction in 
the absence of external feedback. The state constraint theory provides novel answers to several 
questions witti respect to conceptual understanding in arithmetic, generates counter-intuitive but testable 
predictions about human behavior, deals successfully witti technical issues that cause difficulties for other 
explanations of the function of knowledge in learning, and fares well on evaluation criteria such as 
generality and parsimony. The state constraint ttieory is incomplete; it does not explain how procedure 
acquisition proceeds in the absence of conceptual understanding, or how learners overcome errors that 
can not be described as violations of principles. Future wori< will focus on the question of how knowledge 
and experience interact in procedural learning. 
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Rote vs. Meaningful Learning In Arlthmetsc 

School children tend to learn arithmetic procedures by memorizing them, rather than by constructing 
them on the basis of their understanding of numbers. Consequently, they execute those procedures 
mechanically, as sequences of physical actions on vvritten characters rather than as abstract operations 
on numbers. If they arrive at con-ect answers, it is because they recall the relevant procedure accurately, 
not because they understand the underiying mathematical concepts and principles. 

Rote learning of arithmetic procedures has several negative consequences. Memorized procedures 
are brittle. They lack the flexibility required to transfer to unfamiliar problems or even to minor variations 
of familiar problems. Students often fail on a novel task that Is conceptually equivalent to, but 
procedurally distinct from, some clher, already mastered task. Inability to adapt a procedure to changes 
in the task implies that each new task has to be learned separately. 

Memorfzed procedures are also prone to nonsensical enx)rs. For instance, in the so-called 
SMALLER-FROM-URGER BTTOT in multl-colurr.n Subtraction (Brown & Burton, 1978; Burton, I9d2), the 
student subtracts the smaller number from the larger within each column without regard for which number 
belongs to the minuend and which number belongs to the subtrahend. In the so-called freshman en-or 
(Silver, 1986, p. 189) in the addition of fractions the learner adds the denominators as well as the 
numerators, and in what we might call the decimal-as-integer en-or, the learner Judgei, the relative size of 
decimal fractions on the basis of their integer values (Hiebert & Weame, 1986, p. 205). These en-ors are 
nonsensical because they violate the meaning of the con-esponding arithmetic oper&;ions. Nonsertsical 
errors slow down learning because they resist remedial instruction. 

Finally, memorized procedures resist being incotporated as subprocedures into higher-order 
procedures. Students often fail to perfonn steps A and Sin combination, even though they are capable of 
performing both A and B in isolation. For instance, we have observed in our field studies children who 
know how to put two fractions on the same denoniinator and who also know how to add two fractions with 
equal denominators, but who nevertheless are unable to figure out how to add two fractions with unequal 
denominators.^ Since mathematics is a hierarchically organized .-'ibject matter, inability to build on 
previously mastered procedures severely limits the mathematics that can be learned. 

The wori<ing hypothesis that dominates current research in mathematics education is that conceptual 
understanding is the cure for these negative effects. We will refer to this belief as the Conceptual 
Understanding Hypothesis. If children understood what they are doing, this hypothesis claims, children 
could discover procedures on their own, learned procedures would be flexible, nonsensical errors would 
be corrected spontaneously (or at least not be persistent to remediation), and already mastered 
procedures would easily combine to fomi higher-order procedures. The Conceptual Understanding 
Hypothesis claims that procedures can be derived from the learner's knowledge, in contrast to being 



'Our empirical research on the learning of fractions will be reported elsewhere. 
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derivsd either from experience or from an external source such as a teacher or a textbook. In previous 
work we called this type of leami.-ig rational learning (Ohisson, 1986. 1937b; Ohisson & Rees. 1987). The 
Conceptual Understanding Hypothesis extends the Idea of rational learning by claiming that procedures 
which are derived from knowledge are more flexible and less error-prone tfian procedures that are 
leamed In other ways. 

Common sense strongly supports the Conceptual Understanding Hypothesis, but, as Brooks and 
Dansereau (1987, pp. 134-136) point out in their recent review of what they call content-to-skill transfer. It 
has been the subject of a surprisingly small amount of systematic research. There are scattered studies 
that demonstrate a facilitating effect of understanding a principle on subsequent problem solving (Egan & 
Greeno. 1973; Mayer. Stiehl, & Qreeno, 1375; Katona, 1967). However, the strongest case for 
conceptually based procedure acquisition has been made by Gelman and co-workers with respect to 
counting (Gelman & Gallistel. 1978; Gelman & Meek, 1983. 1987; Gelman, Meek. & Meri<in, 1986; 
Greeno, Riley, & Gelman. 1984). Gelman and Gallistel (1978) formulated a set of principles that 
determine the correct procedure for counting. The three most Important are the One-One Mapping 
Principle, the Cardinal Principle, and the Stable Order Principle. The One-One Mapping Principle states 
that each object should bo assigned exactly one number. The Cardinal Principle stales that the last 
number to be assigned to an object Is also the answer to ttie countlny problem. The Stable Ordar 
Principle states that the numbers have to be considered in numerical order. Gelman and co-workers 
have presented evidence for the hypothesis that children know these principles before they have a 
procedure that enables them to count correctly, and Jhat they construct their counting procedures on the 
basis of these principles. The evidence Includes the facts that children typically acqulf the correct 
procedure for counting without fomial instruction In counting, and that their counting procedures are 
flexible. Children readily adapt their procedures for counting to non-standard counting tasks, such as 
counting objects in a particular order, or in such an order that a specified object is assigned a specified 
number (Gelman & Gallistel. 1978). Greeno. Riley, and Gelman (1984) and SmHh, Qreeno. and Vitolo (In 
press) have proposed a theoretical analysis that shows how flexible counting performance can be derived 
by a planning mechanism from a set of action schemata that embody the counting principles, thus lending 
support to this interpretation of the evidence. In short, research suggests that the normal acquisition of 
counting in our culture exemplifies the Conceptual Understanding Hypothesis.^ 

If counting represents a dear example of knowledge-based procedure acqulsHlon In arithmetic, then 



conclusion that cWldrsn know th« counting principles before they learn counHng procedures Is not uncontested. Plaoet 
(1952) concluded on the basis of his research that children do not understand number In the pre-operational stages, because the 
constnxHon o^ number Is raordlnated with the construction of logical operations. Bralnaid (1979) has argued on ttie basis of 
extensive empirical studies that the notion of oidina^ity develops before ttie notion of cardinality, a conclusion which complicates the 
relation b^een counting and ttie Cardinality Principle. Botfi Fuson & Hail (1983) j-nd Briars and Slegler (1984) have proposed 
accountsof chiWrens' counting that assume that procedures are leamed before principles. Baroody & GInsburg (1986 pp 76.78) 
agree with this view. This view Is further supported by recent studies by Douglas Frye. Nicholas Bralsby, John L^e' Celine 
Maroudas, and Jon NIcholls at ttie University of Cambridge. England (personal communication). Since our purpose in this report is 
to prewnt a computational Interpretatten of tfio Conceptual Understanding Hypothesis, rattier than to make a critkal appraisal of the 
empirical Irterature, we have adopted ttie principles first view i-s our wori<ing hypottiesis. Cleariy, ttie Conceptual Understanding 
Hypotheste retain? rts interest as a pedagogkal stance, even if the^bato about chiWreniV counting shouki ultimatetv be resolved in 
favour of tfie procedures first view. 
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multi-column subtraction represents tho opposite. The evidence for ro- leaming is particularly strong 
with respect to this procedure. Over one hundred distinct error types have been identified in childrens" 
subtraction performances, most of them as nonsensical as the prototypical smaller.from-larger error 
mentioned above (Brown & Burton, 1978; Burton, 1982; Young & O'Shea, 1981). Kurt VanLehn has 
proposed a theory that asiumes that understanding of, say, place value does not enter Into the 
acquisition of the procedure for multi-column subtraction as it actually occurs in the dassroom (Brown & 
VanLehn, 1980, 1982; VanLehn. 1983a, 1983b, 1985a, 1985b, 1986). According to his theory, children 
pay littie attention to, or are intellectually unequipped to make much use of, teachers' explanations of the 
subtraction procedure, instead, they construct the procedure by Induction over the solved examples 
provided by textbooks and teachers. If the resulting procedure is incomplete, the learner may encounter 
situations in which the procedure cannot be executed, so-called Impasses. The learner is hypothesized 
to respond to such difficulties by making local changes In the procodure. VanLehn's theory explains a 
significant proportion of the empirically observed procedural errors for multi-column subtraction, thus 
strongly supporting the notion that children learn the subtraction procedure by rote. 

In summary, research has provided us wHh in-depth analyses of two contrasting examples of 
procedure acquisttion In arithmetic. The case of counting exemplifies procedure acquisition based on 
understanding of the relevant principles, and the case of subtraction exemplifies procedure acquisition 
through memorization. The subtraction research Is silent on the question of whether conceptual 
understanding could fadlttate the leaming of subtraction, ft only makes the case that the acquisttion of 
the subtraction procedure as H currently occurs in schools does not, in fact, engage the learner in the 
mathematics that underiles that procedure. The pedagogical hope expressed In the Conceptual 
Understanding Hypothesis is that the subtraction procedure could be acquired in the same intelligent 
manner as the counting procedure, if cnly children understood the principles of subtraction as well as they 
understand the principles of counting. 

The obvious Instructional Implication of the Conceptual Understanding Hypothesis Is that we need to 
find ways of teaching children to understand the conceptual underpinnings of arithmetic procedures. A 
significant proportion of research In mathematics education is directed towards this goal (see, e. g.. Bell, 
Costello, & Kuchemann, 1983; Davis. 1984; Hiebert. 1986; Romberg & Carpenter, 1986; 'shoe'nfeld,' 
1985; Silver, 1985). 

The research reported here has a different purpose. Our goal is to clarify the nature of the 
hypothesized link betwoen conceptual understanding and procedure acquisition. How does conceptual 
understanding fadlHate procedure acquisition? In a major review of the psychology of mathematics 
Resnick and Ford (1981) summarized the state of the research with respect to this questions as follows: 
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The relationship between computational sldli and mathematical understanding is one of thfr oldest 
concerns in the psychology of mathematics. It is ai«o one that has consistently .•iud«i successful 
fomiuiatlon as a research question. ... Instead cf focusing on the interaction between computation and 
understanding, between practice and insight, psychologists and mathematics educators have been busy 
tryino to demonstrate the aupsriority of one over 'he other. ... Ti.e rsL'^Mips between skill and 
understanding were never effectively elucid..ed. What is needed, and what nc-A. seems a possible research 
agenda, is to focus on /jotv understanding influences the aquisition of computational routines 

(Resnick & Ford. 1981 , p. 246) 

information processing analyses of human cognition imply that an analysis of the relation between 
conceptual understanding and pertomiance consists of two components: A representation for conceptual 
understanding plus a computational machinery that can derive a procedure for a particular task from that 
understanding (Greeno. Riley. & Geiman, 1984; Smith, Greeno. & Vitolo, in press). Such an analysis 
should explain how conceptual understanding is represented In mrmry how it functions in performance 
and how It can facifitafe learning. The work reported here is based on V...s fomiulaf.on of the problem. 

We approach this problem by building a computer model of learning that instantiates the Conceptual 
Understanding Hypothesis. Such a model has many uses. First, the model can provide what is known as 
a st/.iffc/encyproo/ (Newell and Simon, 1959, p. 5). The model can provide a concrete demonstration that 
the kind of leaming tnat mathematics educators envisioi s. in fact, possible. Second, the model can 
serve as a tool for generating prudictions from a particular , . t of hypotheses abo-it understanding. Third, 
it can serv« as a focu.s of debate. Other researchers may not agree that our model reprasents leaming as 
it actually occurs in. say. the case of counting, or as it ought to proceed in the classroom. The fomiulation 
of altemaUve interpretations of the Conceptual Understanding Hypothesis ought to be facilitated by 
having something precise to disagree with. Fourth, our model can serve as a tool for the planning of 
empirical studies of the role of conceptual understanding in the leaming of procedures. Fifth, it can be the 
basis of diagnostic ;nstruments that focus on misconcsptions rather than on bugs (Ungley. Wogulis. & 
Ohisson, in press). Sixth, it can facilitate comparison between the Conceptual Understanding Hypothesis 
and other hypotheses being explored in current research on leaming. Seventh, it can be used to derive 
instmctionai impiications that can be tested in classroom interventions. 

The report is organized 3s follows. We begin by stating a theory of conceptual understanding and Its 
relation to performance ar.d to procedure acquisition {The State Constraint Theory of Understanding, p. 
8). In ttie secoftd section we describe a computer model based on this theory {A Computer Model, p 19) 
and In the foUo'^ng section (Computations! Results, p. 28) ..e rep a on three applications of the modeh 
(a) the constmclion of a counting procedure in the absence of explicit instmction or solved examples, (b) 
th9 adapts;:cr. of an existing counting procedure to changes in the counting task, and (c) the spontaneous 
conrecnion of procedural errors in multi-column subtraction. We then compare our wori< with previous 
efforts to simulate procedure acquisition in arithmetic {flelation to Previous Research, p. 60). and d'scuss 
its implications (General Discussion, p. 89). 
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The State Constraint Theory of Understanding 

A theory of the role of conceptual understanding in the acquisition of procedures consists, at the 
broadest level of analysis, of two components: a representatiorj for conceptual understanding and a 
computational machinery that maps that understanding onto a procedure for a particular task (Greeno, 
Riley, & Gelman, 1984; Smith, Greeno, & Vltolo, in press): More specrfically. such a theory should 
answer at least the following questions: 

1. What is the nature of conceptual understanding, and how is it represented in the mind? 
What kind of cognitive structures are we refening to when we speak of someone as 
understanding, say, multi-column subtraction? 

2. What function does conceptual understanding have in performance? How does 
understanding interact with the procedure during execution? What is the difference 
between executing a procedure con-ectiy and with understanding, as opposed to executing 
it con-ectiy but without understanding? 

3. What function does conceptual understanding have in the learning of procedures? By what 
mechanism does understanding enter into the construction of a procedure? How does 
understanding enable the leaner to discover a procedure, to apply a procedure in a flexible 
manner, to connect nonsensical errors, and to combine procedures into higher-order 
procedures? 

The theory proposed here is based on the idea that learners ad with understanding when they 
Internally monitor their perfonnance on a problem by comparing the successive states of the problem with 
what they know about the task environment. According to this theory learners execute the procedure for. 
say. multi-column subtraction with understanding when tl.<jy think about each state of the subtraction 
problem in tenro of the principles of arithn^etic. Learning occurs when an incon-ect or incomplete 
procedure generates a problem state that is inconsistent with the principles that govern the task 
environment^. Cognitive change is in the direction of greater consistency between the learner's actions 
and the stmcture of the task environment (to the extent that the latter is known to the learner). For 
Instance, an incon-ect subtraction procedure may result in a difference between two integers that is larger 
than the minuend. To the extant that the learner knows that n - m « r implies r < n, the subsequent 
revision of the regrouping procedure is in the direction of preventing violations of this principle in future 
applications of that procedure, or so the theory claims. 

The purpose of this section is to state our hypotheses about understanding, about performance, and 
about learning. In the next section we describe a computer model that instantiates these hypotheses (A 
Computer Model, p. 19). in a later section we describe some results obtained by running the model 



It may seom as if pfoWeni statos that violate the principles of the environment are impossfole In non-symbolic domains. For 
Instance, one cannot construct, say. an electronic circuit that violates the principles of electricity. However, the term "problem state" 
as used In our theory refers to the mental reprBsentathn of the state of the problem, not to the physical problem situation. This point 
wiJ be clarified In the subsection that presents our performance theory {HypothQSOS about performance, p. 12). 
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(Computational Results, p. 28). The simulation runs show that the hypotheses stated here predict 
learning behavior that is consistent with the Conceptual Understanding Hypothesis. 

Hypotheses about understanding 

In this report we use the term "understanding" to refer to a collection of general principles about the 
environment, formulated as constraints on the possible states of affairs. We unpack this notion in four 
steps. 

Understanding consists of knowledge about the task environment 

The Conceptual Understanding Hypothesis claims that correct and flexible performance is achieved 
when the leamer constructs the required procedure on the basis of his/her understanding. The type of 
understanding that we focus on in this research is understanding of the domain in whicri a procedure 
operates. To understand a domain is to know the principles that govern the objects and events in that 
domain. For instance, to understand electricity is to know the principles that govern the behavior of 
electric cun-ents; to understand arithmetic is to know the laws of numbers. This lype of understanding is 
central to the laaming-by-doing scenario, in which the leamer constructs a procedure in the absence of 
instruction. 

An aftemative view is that to understand a procedure is to know the purpose of each step in the 
procedure. Such an understanding is sometimes called a teleological semantics for the procedure 
(VanLehn & Brown, 1 980). A second view of understanding is that one understands X when one 
subsumes X under some existing cognitive structure. We might call this repressn/aftona/ understanding, 
since it emphasizes the encoding of a problem '-^ opposed to the procedure for solving it). The 
subsumption theory of understanding has been applied both to problem solving (Qreeno, 1978, 1983; 
Anderson, Greeno, Kline, & Neves, 1981), and to text understanding (e. g., Galambos, Abelson, & Black, 
1986; Schank, 1986). Yet another view is that to understand a procedure is to know the justification for 
the procedure. This conception of understanding is common among professional mathematicians. Both 
teleological and justificatory understanding are crucial in the leaming-by-being-told scenario, in which a 
teacher demonstrates the execution of a procedure and then explains that procedure, i. e., verbally 
communicates its teleology and Hs justification. A complete theory of understanding would specify the 
nature and functton of both conceptual, teleological, representational, and justificatory understanding. 
Michener (1978) ha"5 proposed such a multi-facetted view of mathematical understanding. 

Kn owledge is dedarattve rather than procedural 

Cunrent cognitive theory recognizes two kinds of knowledge, declarative knowledge and procedural 
knowledge (Winograd, 1975). This distinction is essential to the theory proposed here. For instance, 
consider the assertion that 



This assertion Is not a procedure; it does not^s^cify how to accomplish any task. But it is relevant for 



Uppsala Is ninety kilometers north of Stockholm. 
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many different procedures^, such as if you are in Uppsala and your goal is to get to Stockholm, then 
travel south for ninety kilometers and if you are in Stockholm, and your goal is to get to Uppsala, then 
travel north for ninety kilometers. The set of procedures for which an assertion is relevant is cpen-ended. 
As an example of a less immediate consequence of the above assertion, consider the procedure if you 
are midway between Uppscia and Stockholm, and feel like getting as far away from both as possible, 
then travel either straight west or straight east. The only limits on the set of procedures for which an 
assertion is relevant are the limits on our imagination. As a second example, consider the following 
arithmetic principle: 

A set of numbers always yield the sanw sum, regardless of the 
order in which they are added. 

This principle does not in itseff specify how to accomplish any parti'cular task, but the set of procedures for 
which it is relevant is open-ended. For instance, the above principle is crucial for the standard procedure 
for mufti-column subtraction because it enables regrouping of the minuend. 

Declarative and procedural knowledge differ along ttiree dimensions. First, declarative knowledge is 
goaf-independent, while procedural knowledge is goal-related. Declarative knowledge is knowledge 
about what ttie world is like, while procedural knowledge is knowledge about how to attain particular 
objectives. Declarative knowledge is potentially useful in reaching an infinite range of goals, including 
goals that the learner had never tiiought of at the time of storing ttie knowledge in memory. 

Second, declarative knowledge is context-free while procedural knowledge is situated. Uppsala is 
always ninety kilometers north of Stockholm; tiie distance is not a function of ttie cun'ent situation of the 
peison who is making use of this fact. But the procedure for getting to Uppsala by travelling ninety 
kilometers northward Is only useful if ttie person finds himself/herself In Stockholm; it does not lead to the 
goal if executed in any other situation. Similariy, a sum of a set of numbers Is unique; it Is not a function 
of the problem the agent is trying to solve. But the regrouping procedure is appropriate only witti respect 
to subtraction problems in which some minuend digit is larger than the corresponding subtrahend digit. 

Third, declarative knowledge is assertive or descriptive, while procedural knowledge is exhortational 
or inperative. Declarative knowSedge relates objects and events in the world to otfier objects or events, 
while procedural knowledge relates situation/goal pairs to actions. Procedural knowledge Is knowledge 
about what to do In in order to obtain some particular state of affairs. It is neither true nor false, but more 
or less effective; executing a certain action in a particular situation will lead to attainment of the relevant 
goal witti more or less expenditure of time, cost, or effort. 

In ttie research reported here we take tiie stance that ttie term "procedural knowledge" is, strictly 



*A procedure typically consists of a (possibly very large) collection of rules. The single procedures discussed In tfifs section 
consist of just a single rule each. 

August KUL-88-03 1988 



Ohisson & Rees 



11 



Rational Learning 



speaking, a mishomer.s Procedures do not encode knowledge; they encode dispositions to act in 
particular ways under particular circumstances. Hence, understanding cannot be encoded in action 
schemata, methods, operations, rules, or other procedural representations. The opposite stance is that 
all knowledge is procedural. For example, the Soar simulation model by Allen Newell and co-workers 
(Laird. Rosenbloom. & Newell. 1986) is build on the assumption that all knowledge Is encoded in 
production rules; the Soar system does not have any other representational format. A compromise, 
stance is that knowledge can be either procedural or declarative. For example, the ACT* model 
(Anderson. 1976. 1983) is build on the assumption that there are separate memories for propositions and 
for rules. 

Understandi ng consists of principled rather than factual knowledge 

Declarative knowledge can be divided nto two types. Abstract or principled knowledge consists of 
assertions about universals. The principle that the sum of a set of numbers is unique states something 
about arithmetic sums In general. Fac/iya/ knowledge, in contrast, consists of assertions about particular 
objects or events. The statement that Uppsala is ninety kilometers north of Stockholm is an example of 
factual knowJedge. A factual assertion that refers to a particular spatiotemporal context is sometimes 
classified as an instance of episodic knowledge. 

Cognttive psychology has produced a weatth of information about the storage, retention, and retrieval 
of factual, particulariy episodic, infomiation. However, the Conceptual Understanding Hypothesis 
emphasizes principled rather than factual knowledge. The idea that we have explored in the research 
reported here is that general principles can guide the construction of arithmetic procedures. Factual 
knowledge is not foreign to arithmetic-for instance, three is an odd number Is a factual ass9rt!on--but it is 
less relevant for our current purpose than principled knowledge. 

There are severe philosophteal problems associated with the concept of principled knowledge. For 
instance, since abstract properties of the worid are not directly perceivable, the question arises how we 
can have knowledge about them. Furthemwre, since a general principle is not tied to a particular 
spatiotemporal context, it is not clear what it means for such a principle to be etther true or false. A 
significant proportion of research in epistemology is devoted to clarifying these problems. However, the 
research we report here does not presuppose solutions to the problems of philosophy. We are 
investigating the psychofoglcal question of how the principles a student believes can guide his/her 
procedure acquisition; we are not trying to decide whether he/she is justified in believing those principles. 

The alternative hypothesis is that declarative knowledge consists mainly or exclusively of factual 
knowledge. This hypothesis has the advantage of avoiding the philosophical problems associated wHh 
principled knowledge. But we do not perceive a need to argue for the existence or the psychological 
reality of principled knowledge as a preliminary \o the research reported here. On the contrary, we expect 
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a conclusion about the usefulness of the concept of principled knowledge to be one of the outoomes of 
our research. 

Principles constrain the possible states of affairs 

Traditional debates about the nature of knowledge assume that knowledge consists either of 
descriptions ("All swans are white") or predictions ("The sun will rise tomorrow"). In this report we focus 
on a different aspect of principled knowledge. We view principles as constraints on the possible states of 
the worid. An obvious example is the following common sense principle: 

Two objects cannot occupy the same space at the same time. 

As a descriptive statement, this principle is not very informative; it does not tell us much about what the 
worid is like.^ Nor is It predictive; It does not by itself assert that such and such an event will happen/ 
The impact of the above principle is to rule out certain states of affairs as impossible; It claims that 
situations in which two material objects occupy the same physical space will not occur. Many physical 
laws. e. g. laws of consen/ation, have the character of constraints (Feynman. 1965). 

The notion of principled knowledge as consisting of constraints on the possible states of affairs is 
partic;jlariy relevant for arithmetic. Arithmetic prindples. e. g.. the principles of commutativity and 
associativity, do not predict which arithmetic operations will occur. Instead, they classify states of affairs 
into mathematically valid and invalid states, as It were. For instance, the- principle of commutativity of 
addition claims that It cannot happen that we add two numbers in two different orders and get two 
different answers. 

An alternative hypothesis is that principled knowledge consists mainly of predictive principles (Hollan. 
Holyoak. Nisbett. & Thagard, 1986). We are not claiming that all prindples can be formulated as 
constraints. We would expect an exhaustive investigation into prindpled knowledge to reveal many 
different kinds of prindples. We do dalm that constraints are frequent and particulariy important In 
arithmetic, a domain In which other types of prindples, particulariy predidive prindples, are not relevant. 

Hypotheses about performance 

Learning is a change In performance. Hence, specific hypotheses about learning presupposes 
specific hypotheses about the nature of performance. The purpose of this subsection is to state our 
hypotheses about the cognitive machinery that executes a procedure, and about the function of principled 
knowledge In such execution. 



^It contrasts \n this regard with a pnncipiG like planets travel in elliptical orb'its, which does have descriptive content. 

contrasts In this regard with a pnncfple itke the traditional Swedish saying that if the mennebomes turn bright red in the fall the 
winter will be very cold, which does have predictive content. (That fact that an assertion has predicth/e contem obviously does not 
imply that It also has predictive accuracy.) 
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TTilnklng Is heuristic search 

We have chosen to work within the theory of thinking proposed by Newell and Simon (1972). The 
basic Idea of their theory is that humans think by searching a problem space. A problem space is defined 
by (a) the initial state of the problem, (b) the ensemble of operators available for processing the problem, 
and (c) the criterion for what counts as a goal state. Searching such a space means tentatively applying 
operators to states in order to find a sequence of operators that lead from the initial state to the goal state. 
The search is guided by heuristics, rules of the general form when trying to obtain goalG, and the current 
situation have properties P^.Ps, ... . P„ , then consider action A, The reader is refen-ed to the original 
statement of the theory for details (Newell & Simon, 1972). 

There are several reasons for selecting the theory of heuristic search as our performance theory. 
First, we prefer building on previous research over inventing computational mechanisms ad hoc to suit 
our cun^ent purpose. By choosing the main perfomiance theory to emerge in recent research on thinking, 
we integrate our efforts with other research efforts. Second, the theory of heuristic search is a general 
theory. The mochanism of heuristic search is applicable to many task domains, not just to arithmetic. By 
using a computational mechanism that has been applied to a wide range of tasks we incfBase the 
plausibility of our theory. Third, the theory of heuristic search is precise enough to guide the construction 
of a simulation model. Fourth, the theory of heuristic search is better grounded in psychological data than 
any other cun-ent theory of human thinking. It has been used to explain why some problems are more 
difficult than others (e. g., Kotovsky, Hayes, & Simon, 1985), vrtiy people perform differently on a 
particular problem (e. g., Newell & Simon, 1972, Chaps. 7, 10, and 13), how procedures can be learned 
from practice (e. g., Anzal & Simon, 1979). etc. In short, there is no other theory with comparable 
generality, conceptual precision, and empirical grounding. 

A further reason to select the hypothesis of heuristic search as our performance theory is that it 
satisfies the following criterion of adequacy: 

Criterion of Executability of Partial Procedures. Since procedure 
learning is gradual, the performance theory underlying a learning 
theory must enable a procedure to be executable at each stage 
during its construction. 

A cognitive procedure is not teamed in an all-or-none fashion. Rather, the student learns some part of 
the procedure, flounders, learns some more parts, makes mistates, etc., in a gradual progression through 
different stages of competence until the procedure is completed.^ But at each moment in time during this 
gradual constructten the learner Is capable of acting, of executing the procedure as it exists at that point 
in time. This observation constrains the possible theories about the human performance system to those 
which enable procedures to be executable at each stage of completeness*. 



«At tihfs potrt. further practica may load to the discovery of short-cuts, memorization of special cases. eKmlnation of redundancies, 
chunking cA steps that aKvays follow each other, etc. In the research reported here we are corwemed with the initial construction of 
a procedure, rather than with its subsequent automatizalion. 
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The hypothesis of heuristic search satisfies the Criterion of Executabillty of Partial Procedures, 
because the function of knowledge, according to this hypothesis, is to constrain search, and search can 
be constrained to a higher or lesser dc-.^ree. At the most constrained end of the scale the search follows 
a single, unbranching path through the problem space. To an external observer the resulting behavior 
looks algorithmic. At the other extreme, ttie problem space is searched by randomly selecting operators. 
To an outside obsen/er ttie resulting behavior looks like aimless floundering. A typical performance 
during procedural learning is located somewhere between ttiosa extremes: The learner knows somettiing 
about how to search the relevant space, but not everything; hence, he/she proceeds In the general 
direction of ttie solution, but makes mistakes along the way. A collection of search heuristics is always 
executable, -<»gardless of how Incompletely it represents the target procedure. The resulting behavior 
might be ineff6v':tual, but it will be task oriented. 

An alternative hypothesis to heuristic search is what we might call the problem reduction theory, 
following ttie dassiffcation by Nilsson (1971) of problem solving methods into search mettiods and 
problem reduction metfiods. The problem reduction theory says that a procedure consists of a hierarchy 
of goals and subgoals. Each goal acts like a procedure callm an applicative programming language like 
(pure) LISP. A call to a procedure (goal) is executed by calling its subprocedures (subgoals), which leads 
to calls to tts subprocedures (subgoals), etc., until ttie procedure called Is a primitive operator ttiat can be 
executed without further reduction, in order for the problem reduction theory to satisfy ttie Criterion of 
Executability of Partial Procedures, it must be augmented witti an hypothesis about what happens when a 
procedure call cannot be executed. The ttieory of repairs proposed by Brown and VanLehn (1980, 1982) 
is such an hypothesis. Repair ttieory says ttiat when a problem solver cannot execute a procedure call, 
he/she edits the current control structure for tiie execution of ttiat procedure in such a way ttiat the 
problematic procedure call Is eliminated; normal execution ttien resumes. 

Principles constrain search through state evaluation 

Given ttie choice of heuristic search as our performance theory, and given our focus on principled 
knowledge, the research problem we have posed can be re-stated as follows: 

What role can principled knowledge play in a heuristic search 
system? How can principled knowledge improve performance and 
facilitate the revision of search heuristics? 

Heuristic search consists of the execution of actions in the pursuit of some goal in a particular context; 
where do principles, context-free knowledge items that do not relate to goals and that do not mention 
actions, impinge on ttiat process? The hypothesis of heuristic search suggests two different functions for 
knowledge: Knowledge can enter into the generation of search steps and/or it can enter into the 
evaluation ot search steps, in accordance with our decision to view principles as constraints, we focus on 
the evaluative function. We envision principled knowledge as a device for internal self-monitoring of 
performance. Since this is ttie cenft-al notion of our theory, we will expand It briefly here; more technical 
details are provided In the section on the perfomiance mechanism of our simulation model {The 
performance mechanism, p. 19). 

.19 

August KUL-ee-03 lona 



Ohisson & Rees is o ». ■ ■ 

Rational Learning 

We hypothesize that principles are encoded in memory as state constraints, criteria which a search 
state has to satisfy In order to be valid or correct. A heuristic search mechanism can compare each 
search state with those constraints, and decide whether it satisfies the constraints. States that violate 
one or more constraints are inconsistent with the system's knowledge and should be avoided; they are 
the results of incomplete or incorreci procedural knowledge. The col.-9ction of state constraints thus 
constitutes a knowledge-based evaluation mechanism that enables the search system to monitor the 
perfomiance of its own procedural knowledge. For instance, an incomplete or incorrect arithmetic 
procedure is likely to generate states of affairs that are not In accord with the laws of the number system. 
A counting procedure that does not select a new object before generating the next number counts the 
same object repeatedly, thereby violating the constraint that each objects should be associated with 
exactly one number. A regrouping procedure that perfonns a decrement without perfomiing the 
corresponding increment will change the value of the number being .regrouped, thereby violating the 
constraint that the value of the minuend should remain comtant during subJraction. State constraints 
enable a perfomiance mechanism to catch itself, as it were, in making errors. 

The hypothesis that the function of principled knowledge is to evaluate search states satisfies tht, 
Criterion of Executability of Partial Procedures. The search procedure may be more or less effective, but 
at each level of effectiveness it is possible to classify the search states generated as either consistent 
with the ava'table constraints or as violating them. If the search procedure is neariy complete and correct, 
then there will be few states that violate the system's constraints; if is radically incomplete or incorrect.' 
then many search states will cause constraint violations. But the system is executable regardless of the 
level of completeness of its procedural knowledge. Principled knowledge can also be more or less 
complete. If the system knows many constraints, then a large proportion of the invalid states will be 
caught, as it were. If the system knows only a few of the, relevant constraints, then invalid states will slip 
through, possibly resulting In a wrong answer. But the computational mechanism does not cease to 
function in the presence of incomplete knowledge. 

. The alternative hypothesis is that principled knowledge impinges on heuristic search in the 
generation, rather than In the evaluation, of search steps. This hypothesis is intuitively plausible, and it is 
implicitly presupposed in many analyses of human thinking, e. g.. in analyses of scientific problem solving 
(e. g., Jones & Ungley. 1988). medical reasoning (e. g.. Patel & Groen. 1986). etc. There is no reason to 
expect knowledge to have a single function in thinking and teaming. Human beings obviously use 
knowledge both in generating ideas about what to do and in evaluating the outcomes of their actions. A 
complete cognitive theory must explain both the generative and the evaluative functions of principled 
knowledge. 

Hypotheses about learning 

A theory of learning has two questions to answer. First, when does cognitive change occur? When 
will the performance machinery roll on unchanged, and when will it undergo revision? Second, what 
change will occur? Given the mental conditions that trigger learning, which knowledge structure will be 
revised, and hew will it be revised? ^ 
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Constraint violations trigger procedure revision 

What events trigger the internal change mechanisms? Given a heuristic search system which is 
equipped with a collection of state constraints and which can monitor its own performance by comparing 
search states with those constraints, it is natural to assume that teaming is triggered by constraint 
violations. The constraints-the principted knowledge of the system--enabtes the system to know that its 
procedure is incon-ect and that revisions are needed, ff the search procedure is con-ect and complete, it 
should never generate a state that violates any constraint. A constraint violation indicates that the 
procedure is faulty, and should be revised in such a way that application of that procedure in the future 
will not lead to further constraint violations. 

Many altemative hypotheses about the mental conditions that trigger learning are possibte. Some 
teaming theories assume that teaming is continuous. For instance. Neves and Anderson (1981, p, 73) 
investigated the assumption that whenever two procedural rules are applied in sequence, the procedure 
is extended with the composition of those two rutes. Traditional S-R theories (Neimari< & Estes. 1967) as 
well as connectionist teaming theories (Hinton, 1987) also assume that learning happens on every trial. 
Other teaming theories tie learning to the goal structure of the procedure being executed. For instance, 
the UPL model (Ohisson, 19B3a, 1987a) and the Soar mode! (Uird, Rosenbloom, & Newell. 1986) both 
team when they succeed in satisfying a subgoal. A different triggering criterion was proposed by Neches 
(1981, 1982, 1987). His mode! of heuristic procedure modification is based on the assumption that 
teaming is triggered by the discovery of patterns in the internal trace of a procedure, patterns that indicate 
that there are redundancies in the procedure that can be eliminated. The fomiulation of the triggering 
condition for a particular theory obviously depends on the knowledge structures postulated by that theory. 

Constraint violations inform procedure revisions 

Given that the current search procedure has generated a search state that violates a constraint, what 
change should occur? We postulate that a constraint violation not only signals that a revision is needed, 
but also that it contains infomiation about how the faulty procedure should be revised. We propose that 
the required change can be derived from the system's knowledge. We have called this idea the Rational 
Learning Hypothesis in previous wori< (Ohisson. 1987b; Ohisson & Rees. 1987). because it claims that 
the learning mechanism has rational grounds for the change that It brings about. 

The learning mechanism of our simulation model can identify the circumstances that lead to a 
constraint violation, and revise the relevant mie in the appropriate way. A precise statement of the 
algorithm that accomplishes this will be given in the next section. The basic idea is as follows. Suppose 
that state S^ is consistent with all available state constraints, but that operation A transfomis S^ into state 
Sg, which does violate a constraint C. The cause of the violation is then to be found in the changes A 
caused in S^. By looking at the those changes, and relating them to the violation, we can pinpoint the 
reason why executing A in S^ lead to the violation of a constraint. The mte that applied A can then be 
revised in such a way that it recognizes situations in which A will have the effect of violating that 
constraint, and avoids executing A in those situations. 

O ^ 
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The hypothesis of rational learning contrasts with tho two aftemative hypotheses of learning by 
Induction and learning by analogy. The dominant hypothesis in current learning theory is that new 
cognitive structures are constructed by identification of the commonalities of a set of examples. When the 
examples are successful problem solving steps, the inductive hypothesis becomes a theory of learning 
through practice. A number of varlaiions on (his theme have been explored (see the collections of articles 
edited by Anderson, 1981; by Bole, 1987; and by Klahr, Langley, & Neches, 1987). Another altemative 
hypothesis is that humans leam primarily f/cm factual or episodic knowledge. The solution to a novel 
problem is hypothesized to be constructed by remembering the solution to some previously solved 
problem, which is then edited, as It were, to fit the n«w problem. The hypothesis of learning by analogy 
has been explored by a number of researchers (Adelson, Gentner, Hammond. Holyoak, & Thagard. 1988; 
Carbonell, 1982, 1983; Qentner, 1987; Holyoak. 1984; Rumelhart & Norman, 1981). Human beings are 
also capable of learning by being told (Hayes-Roth. Klahr, & Mostow, 1981). Both inductive teaming, 
analogical learning, and learning by being told are important psychological processes that will have to be 
included in a complete theory of learning. 

Summary of hypotheses 

The theory of principled knowledge and Its role in perfomiance and learning that constitutes the basis 
of the computer model that we describe in this report can be summarized as follows: 

• Hypotheses about the nature of understanding: 

•Conceptual understanding of a procedure consists of knowledge about the task 
environment in which the procedure operates (rather than of the teleological semantics 
of the procedure). 

• Knowledge is declarative, i. e., goal-independent, context-free, and assertive (rather 
than procedural). 

•The type of declarative knowledge that Is essentia! for procedural learning is 
knowledge of general principles (rather than knowledge of facts and episodes). 

^ Principles constrain the possible states of affairs (rather than describe or predi(rt). 

• Hypotheses about performance: 

• A cognitive performance is a heuristic search through a problem space (rather than a 
problem reduction). 

• Procedural knowledge consists of collections of search heuristics (rather than of 
collections of subgoaling rules). 

• The function of principled knowledge in a heuristic search system is to facilitate the 
evaluation of search states (rather than to fadlitate the generation of search states). 

• Hypotheses about learning: 

• Learning is triggered when an incorrect or incomplete procedure generates a search 
state that violates one or more principles of the relevant domain (rather than, for 
Instance, when two related rules fire In sequence). 
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• A faulty procedural rule is revised on the basis of information In the learner's principled 
knowledge (rather than on the basis of the information in a collection of Instances). 

As we have indicated in the presentation of these hypotheses, alternative hypotheses are possible 
with respect to each issue. In principle, each constellation of hypotheses define a possible cognitive 
model.9 The particular choices we made in constructing the above theory were guided by our purpose of 
constructing a computational interpretation of the Conceptual Understanding Hypothesis. The next 
section describes a computer implementation of these hypotheses (A Computer Model, p. 19). and a later 
section describes the application of that model to the teaming of arithmetic {Computational Results, p 
28). 



In practieo. th« dwign choices ar« not complfltely m'^dular. A choice wRh respect tc one Issue sometimas limHs the choices with 

" performance theory, we are forced to assume that 
«, T^L ] T SsnofaHon or ?he evaluation of search states; there are no other options wHhIn that perfonrance 

ttieory. The view of psychological theo«y constructten as proceeding through successive decisions with respect to a set of doskm 



23 



Q August 

ERIC 



KUL-88.03 



1988 



Ohisson & Rees 



19 



Rational Learning 



A Computer Mode! 

The theory presented in the previous section can be viewed as an abstract specification of an 
Information processing system. A computer model of the theory is a runnable program that satisfies that 
speclffcaticn. Implementation involves inventing computational mechanisms that work in accorda.ice with 
the principles of the theory. We have implemented the Heuristic Searcher (HS). a computer model of the 
theory presented above. We first describe the performance system of the model and then its learning 
mechanism. 

The performance system 

HS is a production system architacture^o augmented with a representation for principled knowledge. 
The system operates by searc-iing a problem space. It selects an as yet unexpsnded search state, and 
applies its current procedure to that state, thereby generating one or more new states. vSearch states are 
evaluated on the basis of their consistency wHh the system's principled knowledge. 

Representation for procedurai knowledge 

A procedure in HS consists cj a collection of production rules. The condition of a production rule is 
matched against the cun >nt search state. The action of a production rule consists of a single problem 
solving operator. An operator consists, in tum. of a de!etion ilst and an addition list. When the operator is 
executed, the expressions in the deletion list are deleted from the current state and the expressions in the 
addition list are added, thereby creating a new search state. 

Production rules encode search heuristics. The intended interpretation of rule R -> O is 'if the 
current search state has property R, th-n consider operator O." There is no distinction in HS between 
seerch procedure.- and other kinds of procedures. An algorithm i.^, a search procedure that is constrained 
enough to generate a single path through the problem space. S.nce the action side of the production rule 
consists of a problem solving operator, a production rule cannot write, edit, or delete expressions 
arbitrarily from woridng memory. Each computation performed has to covr?spond to a step through the 
problem space. 

Representation for prindpied knowtadoe 

Principles are represented in the HS system as state constraints. A state constraint C is an ordered 
pair <C,, C,> of patterns, each pattern similar to the condition of a production rule. The left-hand pattern 
ic called \he relevance pattern, be<»use it detonmines the class of search states to which the constraint 



"Productfon s) s?9ms consist of cdledions of coodWon-acHon rules that are executed by (>) cwnparing their conditior« v/fth the 
memory, v ) se)octlf>g one or more of thos* and vd) evoking the aciisnn of th« selected rulefs). Production systems were firs^ 
Newell and Simon. 1972; Klahr. Langley, & Neche*. 1987; Laird. Rosenbloom. & Newell. 1966) Computer lmplem6nt«. croductior^ 
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is relevant. The right-hand pattern C, is called the satisfaction pattern, because it encodes the criterion 
that a state must match in order to satisfy the constraint (given that the relevance pattem matches). The 
relevance and satisfaction pattems are matched against the search states with the same pattem matcher 
that matches the production mie conditions. No new computational machinery has to be postulated in 
order to augment a production system architecture with this knowledge representation. 

To illustra;e the difference between the relevance pattem and the satisfaction pattem. consider the 
general principle traffic sliould keep to tlie right side of the road. This principle is violated If a person finds 
himself or herself on the left side of the road while driving, if the person is not driving, the principle is 
irrelevant. The HS system would encode this principle as ifHS Is driving, then HS ought to be on the right 
side of ihe road If the current state does not contain the infomiation that HS Is driving, then the 
relevance pattem of the constraint does not match and the constrainf is Irrelevant. If the constraint is 
relevant, then two cases are possible. Either the current state contains the infomiatlon HS Is on the right 
side of the road, in which case the satisfaction pattem matches and the constraint is satisfied, or else the 
constraint is violated. 



The operating cycle 

The system takes one step forward in the problem space during each cyde of operation. A cycle 
begins by HS selecting an as yet unexpanded search state as the current state. The content of that state 
then becomes the effective wori<ing memory for that cycle. There is no other working memory than the 
selected search state. The system then matches all production mies in the current procedure against the 
selected state. One or more of those mies aro evoked, and one or more new states generated. The 
system then matches its constraints against each new state, and records which constraints, if any. are 
violated by that state. 

The reaction to a constraint violation depends upon whether the system is mn in performance mode 
or in learning mode.^^ In performance mode HS executes a best-first search with the number of 
constraint violations a*, the cost function. The cost of a path is thus interpreted as the degree to which that 
path contradicts the system's principled knowledge, rather than as the amount of computational effort 
required to generate the path. Ckinsequentiy, HS prefers solution paths that are more congment with its 
principled knowtedge over those that are less congment. 

In teaming mode HS executes a breadth-first search, because it stops to learn as soon as it 
encounters a search state that violates a constraint. If a state violates some constraint. HS applies its 
leaming mechanism to the mie that produced the constraint violation, thereby revising it. If there Is more 
than one constraint violation. HS selects one of them at random to learn from. After revising a mle, HS 
backs up to the initial state and tries anew to solve the current problem. 



aJ^to ^ '^'"^"^^'^ '"^ "5 be used in cognitive diagnosis win be reported 
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The HS learning mechanism 

A mechanism for procedural learning performs some oditinc, operation on a procedure in order '.o 
improve it. The HS learning mechanism operates by replacing single production rules with mo>'e 
constrained rules. Hence, it is a form of discrimination learning (Langley. I983b 1985, 1987). HS learns 
while doing, i. e., the leaming mechanism operates in the context of the current state of the heuristic 
search. A mechanism for leaming while doing must contain a specification of wrten-under what 
conditions-to pause and revise the procedure (the triggering problt- i criterion for which knowledge 
item to rb/i6e (the assignment of blame problem), and an algorithm lor how to revise that iten (tKe 
revision problem). 

The triggering problem 

Constant violations indicate that the system's- current procedure is not congruent with what the 
system knows about tho task environment. Consequently, HS learns when it generates a search state 
that violates one or more ^tate constraints. When a constraint violation occurs, the system temiinatoa the 
current effort to solve its problem, applies its revision algorithm (see below), and then starts over from the 
Initial state of the problem. 

The assignment of blame problem 

Given that the leaming mechanism has been triggered, which rul& should it revise? Which rule/ is to 
blame for the generation of the invalid state? The construction of the HS system Impiies that the 
constraint violation was produced by the rule that fired the operator that lead to the current state. This is 
shown by the following argument. Suppose that some operator further back In the search path generated 
an invalid state. That state would then have triggered the leaming mechanism, HS would have revised 
the rule that lead to that state, and started over from the initial state. It would never have generated the 
current state. Hence, all states proceeding the current state are valid. The rule to revise is therefore the 
last rule to Are before the cun-ent state. ^2 

The revision problem 

Given a constraint violation the HS system tries to revise the rule that lead to the violation In such a 
way that future applteatlons of that rule will not lead to violations of that constraint. The revision problem 
can be stated as follows: 



nrl'l5.1r""'^"l^*"'Sr'i°' ^^"on are pfincfplod «rrora. The argument doe, not hoW In domains w^ere there aro 

;l"l^^„';.h''{?!f^u f"l' " " f?? "2"' P''""P^ nevertheless b not on the 

cofTRct solutloii path. Thb point is discussed further In a later section (see p. 74). 



Augu9t 



KUL<8§203 



1988 



OhIsson&Rees 22 Rational Learning 

Let and S2 be two consecutive states in a searcfi patii. Hence, 
some production P with condition R was evoked in Sp and fired 
some operator O witii deletion list and addition list O^, ttiereby 
producing state Assume tliat violates constraint C witti 
rolevau -e pattern and satisfaction pattern C^, L e., tliat 
matciies tlie relevance pattern but not tiie satisfaction pattern. 
Wliat is tlie appropriate revision of production P? 

Since HS learns as soon as it encounters a state that violates a constraint. does not violate C (or 
any other constraint). Hence, there are two types of constraint violations. In a Type A violation C is 
In-elevant in S^, and it l>ecomes relevant but not satisfied as the result of the application of operator O. In 
a r>pe S violation C is both relevant and satisfied in S^, and remains relevant but becomes unsatisfied as 
the result of the application of O. We discuss the revision needed to handle the first type of constraint 
violation in detail. 

Revision algorltlim for a Type A violation. To repeat, in a Type A violation C in-elevant in state S^, 
and it becomes relevant but not satisfied in state Sg as the result of the execution of operator O. If the 
relevance pattem does not match S^, but does match S^, then the effect of executing operator O must 
have been to create expressions that enabled to match. But since, ex liypottiesi, the constraint C is 
violated in Sg, O did not create the expressions needed to complete the match for the satisfaction pattem 
C5. This situation wan-ants two different revisions of the mie P that fired O. First, the condition of P 
should be revised so that the revised mie-call it P-only fires in situations in which O will not complete 
the relevance pattem for C. Second, the condition of P should be revised so that the revised mie-call it 
P"-only fires in those situations in which botli the relevance and the satisfaction patterns of C will 
become completed. The details of the two rule revisions are as follows: 

• Revision /. Ensuring tliat tlie constraint is not relevant. The purpose of the first revision Is to 
avoid constraint violation by preventing constraint C from becoming relevant when operator 
O is executed. O will complete when the parts of that are not added by O are already 
present in S^. Those parts are given by (C^ - Og), where the symbol signifies set 
difference*. To limit the execution of O to situations in which it will not complete C^, we 
augment the condition of P with tfie negated expression 

not (C^'O^) 

In summary, if the expression (C^ - O^) matches the cun-ent state, then executing O will mal<e 
C relevant, so we execute O only in situations in which that conjunction does not match.^^ 
The new mIe created is: 

P': R & not{C^ - O^) - > O 

• Revision 2. Ensuring tliat the constraint Is satisfied. The purpose of the second m!e revision 
is to avoid constraint violation by forcing constraint C to become both relevant and satisfied 



notaHon wo use to doscribo the revision algorithm mixes sot-thooretic notions like sot difforonco with logical notions like 
nogatton. This shouW not cause any diffkultles, because there Is an obvteus one-one mapping between sets of expressions and 
conjunctions of expressions: the set of expressbns {E^, Eg. ... E^ correspond to the conjunctten (E^ & Eg & ... & E^. 
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when O is executed. To guarantee that will become complete, we augment the condition 
with the conjunction 

To guarantee that will also become complete we augment the condition of P with a 
conjunction that contains the parts of Cg that are not added by O. They are given by 

Hence, the desired effect is achieved by appending the expression 

to the condition of P. where the symbol "u" signifies set union. If this expression is present in 
the condition of a rule evoking O, then O is guaranteed to make the constraint C both 
relevant and satisfied. The new rule created is: 

P":Ru(C,-03)u(C3«03) ^>0 

Summary of revision algorithm for Type A violations. If rule P with ccndition R evokes operator O in 
some state in wl ich constraint C is irrelevant, thereby creating a new state In which constraint C is 
relevant but not satisfied, then we replace rule 

P: R-->0 

with the two rules 

P: R&nof(Cr-0^)~>0 

and 

P": Ru(C,-03)u(C,.03)..>0, 

where signifies conjunction, signifies set difference and "u" signifies set union. The first ruie limits 
the application of O to situations where C will not become relevant. The second rule evokes O in 
situations where C will become both relevant and satisfied. The section on computational results 
(Computational Results, p. 28) contains several detailed examples of how this leaming algorithm 
functions In the context of teaming arithmetic procedures. 

The above descriptton of the HS revision algorithm is simplified in the following respects: (a) We have 
not described the revisions needed to handle Type B violations, I. e., violations in which C is both relevant 
and satisfied S^, and becomes relevant but not satisfied in as the resuH of operation O. (b) In order 
to add parts of a constraint to a rule condition, con-espondances must be established between the 
variables in ttie constraint and the variables in the rule. HS computes those con-espondances by 
comparing the cun-ent Instantiation of the rule to tfie cun-ent Instantiation of the constraint.''^ (c) We have 

way In which HS handles Type B violations and how rt soJves the problem of variable names are described In OhIsson & 
Ree3(1987). 
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not described the case In which the operator O deletes expressions, (d) Negated conditions can occur 
both In production rules and constraints. A negated condition can cease to match as the result of the 
addition of expressions to a search siate. and the analysis has to be revised accordingly, (e) There are 
cases In which either of the two revisions results In the empty list of new conditions. In those cases only 
one new rule is created. 

The definition of the learning algorithm is heavily influenced by the particular knowledge 
representation that we have chosen for the HS model. The key feature of the knowledge representation 
is the split between the relevance pattern and the satisfaction pattern. This feature implies the existence 
of two different revisions of the faulty rule, one that ensures that the constraint does not become relevant 
inappropriately, and one that ensures that the constraint is satisfied whenever it is relevant. If we had 
chosen a different representation for principled knowledge, we would have defined a different learning 
algorithm. 

Discussion 

HS share certain structural features with other production system architectures used to simulate 
cognitive processes.i^ Each system consists of an interpreter that matches a set of condition-action rules 
against a woridng memory, and selects one or more of the satisfied rules for execution. However, 
production systems differ with respect to the syntax of rulei, the details of the matching process, the 
number of wori<ing memories, the method of conflict resolufion, the leaming mechanisms, etc. Four 
central features distinguish HS from other architectures: the simple mapping between architectural 
concepts and problem solving concepts, the separate representation for principled knowledge, the trade- 
off between generative and evaluative selectivity, and the rational leaming mechanism. Each feature will 
be discussed in turn. 



The first central feature of the HS system is that the architecture has been designed in accordance 
with the performance theory we are using. Constmcts such as operating cycle, production mle. working 
memory, and conflict resolution belong to the theory of infomiation processing systems (Newell & Simon, 
1972, Chap. 2). A specification in terms of these constmcts defines a particular, general-purpose 
infomfiatlon processing system. Although production system architectures are typically used to model 
human cognitive processes, they could. In prindple, be used as general purpose programming 
languages. Constructs such as search, heuristic, problem space, operator, search state, and evaluation 
function, on the other hand, belong to the theory of problem solving (Newell & Simon, 1972, Chap. 2 and 
3). A heuristic search system could, in principle, be implemented in any general purpose programming 
language such as Fortran or Usp. There is no intrinsic connection between the concept of a production 
system and the concept of heuristic search. 



Production system archftGduros o< the type to which HS belongs are somefimos called neo-classicann order to distinguish 
them from so-called baroque production systems used in expert systems research (Davis and King, 1976). The main difference 
between the two types of architectures is that in neo-classical systems the production rule is a procedural construct, while In 
baroque systems the production rule Is a data-unit that is interpreted by unrestricted Lisp procedures. Neo-classical production 
systems languages are descendants from the PSG system developed by Newell (1972, 1973). They have recently been reviewed 
byNeches, Langley, & Klahr (1987). See also Langley (1983a) for an analysis of the space of production system architectures. 
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However. Newell (1980) has proposed the hypothesis that problem solv;>^g Is a fundamental category 
of human cognition, i. e., that a// central cognitive processes take the form of p.oblem solving processes, 
problem solving is the activity of the human cognitive architecture. This hypothesis implies that there 
should be a close relation between architectural constructs and problem solving constmcts in models of 
human cognition. The mapping between archrtecturai constmcts and problem solving constructs Is 
particularly straightforward in the HS system: 

• Production mies correspond to search heuristics. The action of a production rule is 
constrained to be a single problem solving operator. Production mIes cannot arbitrarily 
revise the contents of working memory. It is impossible to fire a production mie without 
taking a step ttirough the problem space. 

• The working memory is ttie current search state. The system has no working memory that is 
independent of the search process. 

• Conflict resolution is done by state evaluation. All production mIes tfiat match tfie current 
state are evoked in parallell, tfiereby generating all possible descendants of tiie currant state. 
A state is selected for expa' ;ion on ttie basis of its value on an evaluation function. • here is 
no architectural process of conflict resolution that is independent of the problem solving 
process. 

• An operation cycle consists of selecting a search state, matching the mIes against that state, 
evoking all satisfied mles. and computing the evaluation function for each new state 
generated. In each operating cycle the system takes one step ttirough tho problem space. 
The system does not perform any other kind of computation. 

In short, HS is an information processing architecture that has been designed in accordance with a 
particular theory of problem solving. The mapping between architectural constmcts and problem solving 
constmcts is similar in intent, but not identical in its details to the corresponding mapping in the Soar 
system (Laird, Rosenbloom, & Newell, 1986). The differences derive in part from our decision to 
represent principled knowledge as distinct from procedural knowledge. 

The second central feature of ttie HS system is that principled knowledge is represented in the form of 
state constraints. A state constraint is a two-part pattem that a search state has to satisfy in order to be 
valid. The first part of the pattem Is used to decide whether ttie "constraint is relevant in a particular state 
or not; if so, ttien the second part of the pattem is used to decide whether the constraint is satisfied or not. 
State constraints have superficial similarities to several othor computational constmcts, but they function 
differently. A state constraint is not a production mIe. It does not evoke motor actions, nor does it revise 
the content of worthing memory. A state constraint is not an inference mIe; in particular, it is not a Horn 
clause. ^6 The satisfaction pattem is not infen-ed or created when the relevance pattem matches. A state 
consJraint does not guarantee that its right hand side is tme when its left-hand side is tme; it claims that 
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the right hand side ought to be true. State constraints have three different functions in the HS system: 
they constrain search during performance, they control when ieaming is to occur, and they serve as a 
source of infomiation about how the current procedure should be revised. The notion of a state constraint 
is, as far as we know, unique to the work reported here. Other mechanisms for interfacing declarative 
and procedural knowledge have been proposed in the context of other simulation models. They will be 
discussed in later section (Relations to Previous Research, p. 60). 

The third central feature of the HS system is that perfomiance is guided by both generative and 
evaluative selectivity. Generative selectivity operates through strategic rules that propose good moves. 
Strategic rules improve the efficiency of search by focusing attention on the most promising actions in 
each state. Evaluative selectivity operates through evaluation functions that measure the promise of a 
state. Evaluation functions improve the efficiency of search by focusing attention on the most promising 
states. Confusingly, both strategic rules and evaluation functions are called heuristics in the search 
literature (Groner, Groner, & Bischof, 1983; Peari, 1984). A. I. systems typically employ one or the other 
type of selectivity, but not both. The HS system operates with both generative and evaluative selectivity. 
Generative selectivity resides in the procedural knowledge (the production rules), while evaluative 
selectivity resides in the principled knowledge (the state constraints). The production rules generate 
actions, and the state constraints evaluate the states produced by those actions. The performance of the 
system is a function of both, and one type of selectivity can be traded off for the other. If either the 
procedure or the principled knowledge is correct and complete, correct perfomiance will result. If both are 
deficient, perfomiance may or may not be correct; the outcome depends on particular interactions 
between them. 

The fourth central feature of the HS system is the rational learning mechanism. HS does not learn by 
being told the procedure it is trying to learn, nor by inducing it from a set of solved examples, nor by 
generalizing over a set of successful steps found by trial and error. The HS system constructs a 
procedure by constraining it to be consistent with the relevant prindples. The state constraints control 
when learning is to occur: HS learns when a production rule generates a search state that violates some 
state constraint. By monitoring its perfomiance with the state constraints, HS can know that a particular 
rule is faulty without being told by an outside source, and before it has completed even a single solution 
path. The revision of the faulty rule is guided by the particular way in which the relevant state violates the 
constraint. The required revision of the rule is derived from the constraint violation through the HS 
revision algorithm (see page 21). Principled knowledge enables HS to deduce the proper revision of the 
rule. This type, of learning mechanism bears a family resemblance to other types of knowledge-based 
mechanisms, particularty to A. i. mechanisms for explanation-based Ieaming (DeJong & Mooney, 1986; 
Mitchell, Keller, & Kedar-Cabelli, 1986) but it contrasts with the experience-oriented character of most 
mechanisms proposed in psychological theories of procedure acquisition. 

The HS architecture is a model of the theory presented in the previous section in the sense that each 
hypothesis stated there is true of HS. Howevor, HS is not the only possible model of that theory, in order 
to bridge the gap between the abstract hypotheses of the theory and the concrete details of the model. 
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auxiliary assumptions had to be introduced as implementation proceeded. The most global auxiliary 
assumption is that tho human cognitive architecture is a production system. Although the production 
system format has extensive support from other modelling efforts, a model of our theory could have been 
implemented within some other type of infomiation processing architecture. Even given the production 
system format, many details of the model could have been implemented differently. For instance, we 
choose to represent state constraints as binary patterns, and to relate them to search states through 
pattern matching. Clearly, there are other implementations of the idea that principles enable error 
detection. There are no hard and fast rules for how to construct the model for a particular theory.^^ The 
particular implementation reported here was choosen on a variety of criteria such as interest and 
simplicity. The justification for the implementation does not reside in the basis for the design decisions, 
but in the behavior of the resulting model. 

A large number of hypotheses are required to specify an infomiation processing architecture. It is 
almost Impossible to derive predictions about the behavior of such a system by hand. The main purpose 
of fleshing out Jhe hypotheses of the theory wfth the auxiliary assumptions required for implementation is 
precisely to use the implemented model to derive the behavioral predictions by mnning the model. The 
next section describes a sample of behaviors of the HS system in the context of the acquisition of 
arithmetic procedures. 
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Computational Results 

The purpose of this section is to report three applications of the HS model that are relevant to the 
Conceptual Understanding Hypothesis. The first two applications demonstrate that HS can replicate the 
basic phenomena of children's learning in the domain of counting. First, we demonstrate that HS can 
construct a general counting procedure on the basis of the principles of counting, without receiving 
instruction in the procedure and without being given any solved examples. Second, we demonstrate that 
once HS has acquired a procedure for counting, the system can adapt that procedure to changes in the 
definition of the counting task. The third application demonstrates that the same mechanism ttiat learns 
successfully in the domain ofcounting also learns successfully in the domain of symbolic algorithms: We 
verify ttiat HS can cure itself from errors in its procedure for multi-column subtraction, if it is supplied with 
a state constraint representation of the conceptual basis for ttiat procedure. 

Constructing a procedure for an unfamiliar task 

The basic daim of the Conceptual Understanding Hypotiiesis is ttiat if a learner has principled 
knowledge about the environment in which a particular task appears. «ien he/she can discover a correct 
and general procedure for that task. The strongest evidence for ttiis daim comes from the domain of 
counting. Our first application of HS shows ttiat HS can construct a procedure for counting on ttie basis 
of a computational interpretation of the prindples of counting. We describe ttie initial procedural 
knowledge of HS in this application, the prindpled knowledge, the learning process, and the outcome of 
the learning process. 

> 

Initial procedural knowledge for standard counting 

To count a set of unordered objects is to repeatedly select an object from that set. increment the 
current number, and associate the new number witti ttie selected object. When all objects in the set have 
been assodated with numbers, the last number to be assodated wfth an object is asserted to be the 
answer to the counting problam. Riley. Qreeno. and Qelman (1984) call this task standard counting. 
Figure 1 shows a representational language for standard counting. The representation includes symbols 
for objects, sets, and numbers, and for a handful of properties and relations that are relevant for the 
counting task. Figure 2 shows a problem space for standard counting ttiat builds on that representation. 
The problem space indudes six operators, corresponding to ttie capabilities to select an arbitrary object 
from a set, to move attention from one object to anottier. to initialize counting at some number, to move 
attention from one number to anottier. to associate a number with an object, and to assert that a 
particular number is the answer to ttie current task. This set of capabilities is minimal in the sense that 
there is no smaller set ttiat enables the learner to count; if one of these capabilities is missing, the leamer 
Is not ready to leam how to count. The initial state is encoded in ttie language defined in Figure 1. It 
contains a segment of the number line and some objects, some of •Miich are members of the set of 
objects to be counted. The goal state is reached when some number as been identified as ttie answer to 
the counting problem. 

Figure 3 shows an initial HS rule set for standard counting, as well as natural language paraphrases 
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Types of entities: 

The representational language used in the counting application of HS assumes three types of entities: 

• objects Xp ... 

• numbers Hp ... 

• sets 

The HS model for standard counting considers a single set, namely the set of to-be-counted objects, 
called ToCountSet. 

Propertie s: 

There ar<^ four properties that apply to these entities: 

• First 

• Current 

• Answer 

• Origin 

Both obj€K:ts and numbers can have the properties of being the ffrsf object or number, and of being the 
current object or number (in a sequence of events). A sequence of events can only have one entity that 
has the property of being the first entity considered, and only one entity can be the current entity at any 
one point in time. Only a number can have the property of being an answer. The property of being the 
origin belongs to the smallest whole number the person knows. We assunrie in this application that it 
belongs to unity. 

Relations: 

There are four binary relations that hold between these entities: 

• Next 

• Associate 

• Mender 
^ After 

Numbers are linked through the next relation. The expression (Next n^ ng) means that ng is the 
successor of n^ in the number line. A number and an object can be associated ^iXh each other. An 
object can be a member of a set. In this application we only consider members of the set of to-be- 
counted objects. One entity can be considered after another entity (In a temporal sequence of events). 



Figure 1 : A representational language for standard counting. 

of the rules. The Initial rules impose minimal guidance on the application of the operators. Their main 
effect is to retrieve bindings for the operator arguments from working nr>emory. Since the HS architecture 
is a search system, the collection of rules in Figure 3, although seriously incomplete, nevertheless 
constitutes an executable procedure. Execution of this procedure will generate ineffective but task 
relevant behavior. For instance, counting will be initialized at an arbitrarily chosen point in the number line 
(rule 3). and the number line will be traversed in random order (rule 4). Figures 2 and 3 together 
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T he Initial knowledoe state: 

The initial l<nowledge state for standard counting contains the number sequence (the numbers 1 through 
n, where 1 is marked as the origin and each number is linked to its successor with the nexf relation) the 
set ToCountSet of objects to be counted, and some additional objects that are not members of the 
Tocountset. There is neither a current object nor a current number in the initial state. 

Operators: 

PIckFlrsKX) Declares object x as the first object; it thereby also becomes the current obiect 
The addition list is {(First X)(Current X)}. 
The deletion list O^, is empty. 

PlckNext(X^, Xj) Moves the property of being the cun-ent object from x, to x^. Also records the 
information that x^was attended to after x^. 
The addition list is {(Current X2)(After Xj X^)}. 
The deletion list is {(Current X^)}. 

Inltlallze(N) Declares the number n the first number; it thereby also becomes the current number 
The addition list O, is {(First N)(Current N)}. 
The deletion list O^, is empty. 

lncrement(N^, Nj) Moves the property of being the current number from n, to n^. It also records the fact 
that Ng was considered after N^. 
The addition list is {(Current N2)(After N^)}. 
The deletion list O^, is {(Current N^)}. 

Assoclate(X, N) Associates the number n with the object x. 

The addition list O, is {(Associate X N)}. 
The deletion list O^, is empty. 

Assert(N) Asserts that the number n is the answer. 

The addition list is {(Answer N)}. 
The deletion list O^, is empty. ' 

Qoa! state: 



The goal is to reach a state in which some number has the property of being the answer. 

Figure 2: A problem space for standard counting, 
constitute the initial procedural knowledge of HS in this application. 

Principled knowledge for standard vjuntinq 

Principled knowledge is encoded in HS in the form of state constraints, each constraint consisting of a 
relevance pattern and a satisfaction pattern. The state constraints for standard counting are shown in 
Figure 4 (Part 1 and Part 2). For each constraint the relevance pattern is show to the left and the 
satisfaction pattern C, to the right, separated by the arbitrarily chosen symbol For simplicity, type 
designations like (Object X) and (Number N) have been left out of the statement of constraints. The 
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IJf X is any object, then select x as the first object. 
(Object X) ===> PlckFlrst(X) 

2. If X, is the current object, and x^is any other object, then make x^the current object. 
(Object Xi)(Current X^)(Object Xj) ===> PickNext(X^, Xj) 

3. If n is any number, then initialize counting at n. 
(Number N) ===> inltlallze(N) 

4. If is the current number, and is any other number, then switch to as the current 
number. 

(Number Ni)(Current Ni)(Number Nj) ===> lncfement(N^, Nj) 

5. If n is the current number, and x is the current object, then associate n with x. 
(Number N)(Current N)(Object X)(Current X) ===> Assoclate(X, N) 

6. If n Is the current number, then assert that n is the answer. 
(Number N)(Current N) ===> Assert(N) 

Figure 3: initial rules for standard counting, 

constraints are intended to capture the same ideas as the counting principles proposed by Qelman and 
Qallistel (1978), but our analysis differ from theirs in its details. We have broken down the counting 
principles into their component ideas and we have added some ideas that are not discussed by Qelman 
and Gallistel (1978). 

The One-One Mapping Principle states that counting consists of establishing a one-to-one mapping 
between numbers and objects. As Part 1 of Figure 4 shows, we break this principle down into four 
component ideas: that an object is associated with at least one number, that an object is associated with 
at most one number, that a number is associated with at most one object, and that a number is 
associated with with at least one object. The Cardinal Principle states that the answer to a counting 
problem is the last number to be associated with an object. We break this principle down into three 
component ideas: that the size of a set cannot be known until all objects in the set have been associated 
with numbers, that the answer is a number associated with some object, and that the answer is the last 
number considered (see Part 1 of Figure 4). Our conception of the one-one mapping and cardinal 
principles is essentially the sanr^e as that of Qelman and Qallistel (1978). The difference is mainly that we 
are using a more fine-grained analysis of the ideas involved. 

The Stable Order Principle, on the other hand, does not appear in our analysis. This principle says 
that the numbers used in counting must have a stable, repeatabie order. We want to suggest that this 
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principle contains four distinct ideas. The first idea is that the numbers form a linear ordering. This idea 
is represented in the HS model (as Ir, axiomatic theories of the numl>er system) by the fact that the 
symbols for the numbers are linked together with the successor relation (called Next). This 
representation amounts to an assumption that children have a cognitive representation of the number 
line, an assumption that is supported by the available evidence (Resnick, 1983). Because the Next 
relations are stored in the model's memory, no state constraints are needed to encode this idea. 

The second Idea hiding In the Stable Order Principle is that the number line is traversed in a particular 
way during counting. For correct counting the numbers must be generated In numerical order. Once the 
number line has been stored in memory, it can be traversed in many different ways. For Instance, it can 
be traversed by skipping every other number, by generating numbers in descending order, etc.. Also, 
traversal of the number line can. in principle, begin at any point along the line (although human beings 
may find some potential starting points easier to access than others). But the only way of traversing the 
number line that gives correct results in counting Is to begin at unity and then follow the successor 
relations. Wa call this the Regular Traversal Principle. The state constraint representation breaks this 
idea down into four component ideas: Counting begins with the origin of the number line, each number 
considered is the successor of the previous number, the numbers are considered one at a time, and each 
number is associated with sorm object. The four state constraints corresponding to these Ideas are 
shown in Part 2 of Figure 4. 

The third idea hiding in the Stable Order Principle Is that counting Imposes a linear ordering on the 
objects counted By assigning numbers, which have an intrinsic linear ordering, to objects, which do not. 
we are imposing a linear ordering on those objects. We call this idea the Order Imposition Principle. It is 
broken down into six component Ideas: Only one object is designated as the first object In the ordering, 
objects are considered one at a time, no object is considered twice, an object is not considered after itself, 
the first object is never considered again,^^ and, finally, no object that Is not a member of the to-be- 
counted set Is considered. The six state constraints that encode these ideas are shown In Part 2 of 
Figure 4. 

Finally, the actions of traversing the number line In the right way and imposing an order on the objects 
are not sufficient to produce conrect counting. In addition, the two processes must be connected with 
each other In the right way. The fourth idea hiding in the Stable Order Principle is that objects and 
numbers are associated with each other In the order In which they are attended to. We call this the 
Coordination Principle. The state constraint representation for this idea is shown in Part 2 of Figure 4. 

The state constraints in Figure 4 (Part 1 and Part 2) represent the principled knowledge of the HS 
system in this application. The set of constraints is not unique. Aitemative formulations of the constraints 
are possible. Also, the set is not minimal. The constraints overiap in meaning. For Instance, constraints 



b not to bo considered after itself, and that ttw fir,t object sfwufd never be considered a 
second time are, of course, special cases of the general constraint that no objects should be considored a second time. 
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A. The One-One Mapping Principle 

1. An object should be assodated with at most one number. 
(Associate Ni)( Associate Nj) " (Equal Nj) 

2. Every object considered during counting should be associated with some number. 
(Current Xi)( After X^ Xj) " (Associate Xj N) 

3. A number should be associated with at most one object. 
(Associate X, Ni)( Associate Xj Nj) " (Equal X, Xj) 

4. For every number retrieved during counting there should be some object wHh which It can 

(Current N)(Not (Associate X, N)) " (Current Xj) 

B. The Cardinal Principle 

1. A number Is the answer to a counting problem only if there are no objects which are 
members of the to-be-counted set but which has not been associated with some number. 
(Answer N) " (Not (Member X ToCountS0t)(Not (Associate X N))) 

2. The answer to a counting problem Is one of the numbers associated with some object. 
(Answer N) " (Associate X N) 

^' JJlLTs^^^^ *° "'""^'"^ ""'"'^^ *° considered In the counting 

(Answer N) " (Current N) 



ERIC 



Figure 4: State constraints for standard counting, Part J. 
81 {see Figure 4. Part 1) and C4 (see Figure 4. Part 2) express the idea that all objects should be 
counted In two different ways. Also, constraints D4 and D5 are special cases of D3. Overiap In the 
meaning of state constraints Implies that learning from one constraint may mal<e learning from another 
constraint unnecessary. As a result, all constraints are not Involved In every learning mn. The set of 
state constraints In Figure 4 is complete, it Is sufficient to detennine correct counting. 

The learning process 

HS tal<es dlf;^rent paths through the procedure space on different learning runs, for two reasons. 
First, if HS generates more than one state that violates some constraint on a particular cycle, it selCH:ts 
one at random to ieam from. Second, since the domain theory Is not minimal, learning from one 
constraint may preempt learning from another constraint. Hence, the order In which constraint violations 
are noted by the system Influences the path through the procedure space. The final procedures leamed 
In different learning runs are, of course, very similar, but not Identical. 
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The Regular Traversal Principle 

1 . Initialize counting at the first number in the number line. 
(First N) (Origin N) 

2. Consider one number at a time. 
(Current N^)(Current Nj) *• (Equal Nj) 

3. The numbers should be considered in the order defined by the next relations. 
(Current N^)(After N2)(Not (Equal Nj)) (Next N^) 

4. Fpr each number considered, the preceeding number should be associated with 
object (I. e., use all numbers). 

(Current N^)(Next N^) (Associate X Nj) 

D. The Order Imposition Principle 

1. Initialize counting with a single object. 
(First X^)(Flrst X^) (X^ n Xj) 

2. Do not consider an object that is already associated with a number. 
(Current X)(Not (Current N)) (Not (Associate X N)) 

3. Do not cycle back to the first object. 
(First X^) (Not (After X^ Xj)) 

4. Do not consider an object after itself. 
(After X, Xj) (Not (Equal X^ Xj))} 

5. Consider only one object at a time. 
(Current Xj)) ** (Equal X^ Xj) 

6. Do not consider objects that are not In the set of to-bo-counted objects. 
(Current X) (Member X ToCountSet) 

E. The Coordination Principle 

1. Numbers and objects are associated with each other in the order in which the 
considered. 

(Current X)(Current N^)(Assoclate X Nj) *• (Equal N^ Nj) 



Figure 4: State constraints for standard counting. Part 2. 
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We wlil analyze a particular learning experiment in which HS was started with the initial rule set (or 
counting shown in Figure 3 and the slate constraints shown in Figure 4 and was given practice on 
counting a set with three objects. During learning the model commits the types of counting errors 
observed in chiidrens* performance, such as counting an object more than once, skipping numbers, and 
choosing the wrong number as the answer. It successively con-ects these en-ore by noticing violations of 
the state constraints, and revising the initial rules accordingly. 

As an example of the construction of a rule, considw rule 6 (see Figure 3): Ifn Is the current number, 
then assert that n is the answer. This rule will prematurely assert that the current number is the answer 
when there are still objects left to be counted. HS ieams tlo con-ect rule by transforming rule 6 in two 
steps. Figure 5 shows a graph representation of the path through the rule spaca for this particular rule. 
Learning proceeds from top to bottom. At the top of the figure is the formal version of the initial rule as 
stated in Figure 3. The vertical an-ows represent ieaming steps. At the head of the an-ow is the condition 
or conditions that were added to the rule in that step. Each ieaming step is triggered by the violation of a 
state constraint. The constraint is shown to the right of the vertical an-ows. The iabais on the constraints 
refer to Figure 4. The final rule is shown at the bottom of the graph. The reader who intends to follow tha 
description how of the correct rule is ieamed in detail may want to review the HS learning aIgo.(thm (p. 
21) at this point. 

The first learning step Is triggered when the initial rule violates constraint B2: The answer io a 
counting problem is one of the numbers associated with some object. The fomial vereion of this 
constraint Is shown to the right in Figure 5. Suppose that, say, 2 is the cun-ent number. The condilion 
side R of rule 6 then becomes instantiated to: 

R » {(Number 2)(Cun-ent 2)) 
The additicM list of operator Assert (see Figure 2, p. 30) is then equal to 

Og ■ {(Answer 2)) 

while the deletion list O. is empty. The constraint is in-elevant before the Assert operator is fired, so we 
have a Type A constrain, violation, in which the execution of the an operator makes the constraint 
relevant but not satisfied. Two revisions of the faulty rule are attempted. 

Revision 1. Ensuring that the constraint Is not relevant. The HS teaming algorithm first tries to 
constnjct the expression 

not(C,-0^). 

However, in this case is equal to 

Cf« {(Answer 2)) 

so the relevance pattem and the addition list are identical. Hence, the expression 

not(C^ - Og) s {(Answer 2)) - {(Answer 2)) 
which is equal to the empty set, so no new rule can be created in this revision. 

Revision 2. Ensuring that the constraint Is satisfied. Next, the ieaming mechanism tries to construct 
the expression 
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INITIAL RULE: 

((Number N1) 
(Current N1)) 



> Assert(NI) 



B2: 

_ (Answer N1)** 
(AssocrateXI N1) 



V 

(Associate XI N1) 



B1: 

(Answer N1)** 
_ (Not (Member X2 ToCountSet) 
(Not (Associate X2 N2))) 



V 

(Not (Member X2 ToCountSet) 
(Not (Associate X2N2))) 



FINAL RULE FOR STANDARD COUNTING: 

((Number N1) 

(Current N1) 

(Associate XI N1) 

(Not (Member X2 ToCountSet) 

(Not (Associate X2 N2)))} mmm> Assert(NI) 



Figure 5: A learning path for rule 6 (see Rgure 3). 
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(C.-OJulC^-OJ. 
The left-hand term is. as we just showed, empty, so this expression reduces to 

The satisfaction pattern Cg is in this case equal to 

Cj = {(Associate 2)} 

where X is some object. This expression will not change by the subtraction of 0^ » {(Answer 2)}, so we 
have 

(Cs - Og) = {(Associate X^ 2)} 
Propei substitution of variables for constants leads to the expression (Associate X^ N^), which is added to 
the rule.^^ In other words, the learning mechanism adds the condition that the number designated as the 
answer has to be assigned to some object to the rule. The fonnal version of this condition is shown on 
the path in Figure 5, at the head of the vertical arrow.. Having revised the rule HS backs up to the initial 
state, and tries to do the counting task again, using the new rule instead of the original rule. 

In the second learning step, the revised rule violates constraint B1 (see Figure 4): A number is the 
answer to a counting problem only If there are no objects which are members of the to-be-counted set but 
which has not been associated with some number. The rule is now constrained to select only numbers 
that have been assigned to objects, but it does not yet know that it has to wait until all objects have been 
counted. It prematurely asserts that the current number is the answer, as soon as that number has been 
assigned to an object. This is a Type A violation, because the constraint is not relevant until the operator 
Assert has been fired. As in the previous learning step, the expression 

is empty so Revision 1 does not lead to the creation of a new rule. In Revision 2 HS constructs the 
expression 

(Cr-03)u(C3-03) 

which Is equal to 

(Cs-Oa). 

Since 

Cg = {(Not (Member X ToCountSet)(Not (Associate X N> 

and 

Og =: {(Answer N)} 

the subtraction of the addition list from the satisfaction pattern simply gives the satisfaction pattem 
unchanged. Therefore, the expression added to the rule in Revision 2 is equal to C^. in short, the 
teaming mechanism adds the condition element the set of to-be-counted objects Is empty, or. formally, it 
shouid not be the case that there exists an object which is a member of the to-be-counted set and which 
has not been assigned a number The condition of the rule then becomes as shown in the bottom of 



In order for the new expression to interface correctly wfth the previous expressions in the rule. HS has to coordinate the variable 
names. The compulations involved In the coordination of variable names are not descrfeed In this report, but see Ohisson & Rees 
(1 987). In this report we will simply assume that the variables are given the correct names. 

42 
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INITIAL RULE: 



((Objeaxi) 
(Current XI) 
(ObjeaJ(2)) = 



V 

(Associate XI N1) 



'>PickNe)(t(X1.X2) 



A2: 

(Current X2) 
^ (After X2 XI) *• 
(Associate XI N1) 



V 

(Not (Equal X2X7)) 



D4: 

_ (After X2 XI)** 
(Not (Equal X2 XI)) 



D6: 

. (Current X2)** 
(Member X2 ToCountSet) 



V 

(Member X2 ToCountSet) 



V 

(Not (First X2)) 



(Not (Current N3) 

(Associate X2 N2)) 




D3: 

_ (First X2) ** 
(Not (After X2 XI)) 



El: 

(Current N1) 
(Current X2) 
(Associate X2N2)** 
(Equal N1 N2) 



(Current N3) 
(Associate X2N2) 
(Equal N3N2) 



FINAL RULE FOR STANDARD COUNTING: 



{(Object XI) 
(Current XI) 
(Object X2) 
(Associate XI m 
(Not (Equal X2X0) 
(Member X2 ToCountSet) 
(Not (First X2)) 
(Not (Current N3) 

(Associate X2 N2))) 



t> PlckNext(Xl.X2) 



Figure 6: A learning path for rule 2 (see Figure 3). 
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Rgure 5. The result of this second learning step is a correct rule.^^ 

The learning of the correct rule for asserting the answer is a particularly simple example of a rule 
transformation. The initial rule only has to be extended with two additional conditions, and only one new 
rule is created in each learning step. The rule for selecting the next object, rule 2 in Figure 3. presents a 
more complex case. Figure 6 shows the construction of the correct version of this rule. As in Figure 5. 
the initial rule is shown at the top of the figure, the constraints that are violated are shown to the right of 
the path, and the conditions added to the rule are shown along the path. The final, correct, rule is shown 
at the bottom of the figure. Five learning steps are required to construct the correct rule in this particular 
simulation run. 

In the first learning step 'rule 2 violates the constraint that each object has to be associated with at 
least one number (see Part 1 of Figure 4. constraint A2). This happens because the system moves 
attention from one object to the next without counting it. This constraint violation follows the same pattern 
as the ones analyzed previously. It is a Type A violation, where the first revision does not yield a new rule, 
and the second revision consists of adding the satisfaction pattern of the constraint to the rule. Since c' 
in this case is ' 

Cg = {(Associate X N)} 

the learning mechanism adds the constraint that the current object has to be counted before a new 
current object can be selected. The next two violations follow the same pattern. The revised rule violates 
the constraint that objects should not be counted repeatedly, and so the learning mechanism adds the 
condition that counted objects should not be selected for counting again: 

C3. {(Not (Equal XgX,))} 

The rule next selects some object that is not in the set of objects to be counted, and so is constrained to 
deal only with those objects: 

Cg = {(h^ember X ToCcantSet)} 

In the fourth learning step the revised rule violates the constraint that it should not return to the first 
object (see constraint D3 in Part 2 of Figure 4). This is yet another Type A violation, but in this case 
Revision 1 yields a new rule but Revision 2 does not. Since in this case 

= {(First X^)} 



and 

the expression 
is instantiated to 



Oa « {(After X, X2)(Current X,)} 
(Cr-Oa) 

{(First X^)} - {(After X^ X2)(Current X^)} 



ERIC 



n.r^w."]l[lt'^f"l"*''^!.f'^'®^''^ Cafdinality Prindple (constraint B3 In Part 1 of Rgure 4) was not violated In this laamlnq 
run. This illustrates the oariier comment that the overlap In meaning betwean state constraints Implies that leamirS from one 
constraint may preempt teaming from anot;<er. ^ 
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so we have 

nof(Cf - O3) = (Not (Rrst X,)) 
which is added to the rule. This condition prevents the rule from firing when the object it considers making 
the current object was, in fact, the first object counted. Revision 2 illustrates the complexities introduced 
by negation. The satisfaction pattern of the relevant constraint is a negated pattern, and it happens to be 
the case that the operator PIckNext adds the positive part of that pattern to the state. Hence, Revision 2 
cannot succeed. There is no way of revising the rule so that both the relevance pattern and the 
satisfaction pattems are guaranteed to be true. In fact, whenever the Assert operator fires, the 
satisfaction pattern is guaranteed to be false. The learning mechanism recognizes that the operator adds 
the negation of the satisfaction pattern, and does not create a second rule for this violation. 

Finally, in the fifth learning step, the rule gets out of step, as it were, and violates constraint El (see 
Part 2 in Figure 4) which says that numbers and objects are associated with each other crder in which 
they are generated. This is, once again, a Type A violation, but in this case both revisions generate 
non-empty extensions of the rule, so two new rules are created. 

Revision 1. Ensuring that the constraint is not reievant. We have 

» {(Current Ni)(Current X,) (Associated Nj)) 

and 

Oa - {(After X, X2)(Current X,)} 

Hence, the expression 

no/(C,-OJ 

is in this case equa! to 

nofl{(Cun'ent Ni)(Cun-ent X,)(Associated X, N^)} 
- {(After X^ X2)(Current X^)}J 

which reduces to 

no/((CurrentN,)(AssociatedX, Nj)). 
This expression is a-'-Jed to the rute. The final result is a rule that says "If the current object has been 
associated with a number, and there is a second object that is a member of the set of objects to be 
counted, but that is not the first object, and that has not been associated with a number, then move 
attention to tfiat second object*, which is tfie correct rule, shown at the bottom of Figure 6. 

Revision 2. Ensuring that the constraint is satisfied. Next, we have tfie expression 

(Cr-Oa)u(C,-OJ 

which does not 'reduce to tfie empty list in this case. The part (C, - Og) is, as we have seen above, equal 
to 

{(Current Ni)(Associated X, Nj)}. 

The expression (C^ - 0^) becomes 

{(Equal N, Nj)} - {(After X, X2)(Current X,)} 

which reduces to 
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{{Equal N, Ng)}. 
Hence, the set union of the two expressions is equal to 

{{Current N,){Associatecl X, N2)(Equal N, Nj)} 
which is then added to to create a second new rule. 

The rule created in Revision 2 of the fifth and last learning step is not a correct rule, but a so-cailed 
monster rule. It is a syntactically correct and executable rule which is simply not part of correct counting. 
The rule says that if the current object has been assigned to the current number n. and some other 
object X2 has previously been assigned to n. then select as the next object, which is a manrfestedly 
incorrect counting rule. The rule is harmless, i. e.. it will never fire, if all the other rules are correct, 
because two objects will never be assigned to the same number. However, if other rules are also 
Incorrect, then this rule might fire, ft will generate the error of going back and counting a previously 
counted object as second time. 

Although we have analyzed the construction of rules 6 and 2. respectively, as sequences of learning 
steps, those steps did not occur on successive trials during the learning run. HS does not first go through 
all required revisions of one rule, and then turn to another rule, etc. The learning steps required to 
construct the correct versions of rules 2 and 6 occurred interspersed among the learning steps required to 
construct the other rules. The order of learning steps is determined by the order in which HS encounters 
violations of constraints. In order to make the learning process easier to follow. Figures 5 and 6 abstract 
out the revisions of rules 6 and 2, respectively, from the trace of the simulation run, and presents them in 
sequence. This is an exposition technique, it is not how HS learns. 

An overview of the entire learning process is given in Table 1. The particular learning run analyzed 
here required twenty-two trials. HS practiced on a set of three objects. Each trial consists of a problem 
solving attempt in which HS executes its procedure until a constraint violation is discovered, revises the 
faulty rule, and starts over. The twenty-two learning trials were accomplished in 97 production system 
cycles-. Each line in Table 1 con-esponds to one trial. The first column s? jws the trial number. The 
second column shows the number of cycles before a constraint violation was detected for each trial. As 
the table shows, the number of cycles increases over trials. HS gradually performs larger and larger 
portion of the task correctly. The third column shows the constraint that was violated In that trial. The 
violated constraint is the constraint that HS learned from in that trial. Finally, the last column represents 
the six rules with the digHs 1 through 6; the rule that was revised on that trial corresponds to the 
bracketed number. In the twenty-third trial {not shown in the table), HS counted con-ectly the set of three 
objects. The correct solution to the problem of counting three objects required eleven production system 
cycles. 

As the table shows, the two learning steps that transfonned rule 6 into the con-ect rule occurred on 
trials 8 and 12, respectively, while the five learning steps required to learn the correct fomi of rule 2 are 
spread out over the entire learning process, beginnrng with trial 5 and ending wfth trial 22. The table also 
shows that a constraint can be violated by sev,3ral different rules. For instance, constraint El is violated 
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Table 1 : Overview of the learning process for standard counting. 



No. of cycles before Constraint Rules 1 -6- 

constraint violation violated revised [xj 



1 


1 


A4 


1 2 [3] 4 5 6 


2 


1 


D6 


[1] 2 3 4 5 6 


3 


2 


D1 


[1] 2 3456 


4 


2 


C4 


1 2 [3] 4 5 6 


5 


2 


A2 


1 {2] 34 5 6 


6 


3 


C4 


1 2 3 [4] 5 6 


7 


3 


D4 


1 2 3 [4] 5 6 


8 


3 


B2 


1 2 3 4 5 [6] 


9 


4 


D4 


1 [2] 3 4 5 6 


10 


4 


D6 


1 [2] 3 4 5 6 


11 


4 


El 


1 2 [3] 4 5 6 


12 


4 


B1 


1 2345[6] 


13 


4 


El 


1 2 3 [4] 5 6 


14 


5 


D'l 


1 2 [3] 4 5 6 


15 


s 


D1 


1 2 [3] 4 5 6 


16 


5 


A3 


1 2 3 4 [5] 6 


17 


6 


D4 


1 2 3 [4] 5 6 


18 


6 


D3 


1 2 3 [4] 5 6 


19 


7 


D3 


1 [2] 34 5 6 


20 


7 


D3 


1 2 [3] 4 5 6 


21 


9 


C3 


1 23[4]56 


22 


10 


El 


1 [2] 34 5 6 
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by rules 2 (trial 22). 3 (trial 11). and 4 (trial 13). The table also shows that not all constraints are Involved 
in the learning run. For Instance, constraint D2 was not violated In this particular run. The particular 
learning process HS goes through on the way to mastery of standard counting Is a function of the 
representation, the Initial rules, the state constraints, and the order In which violations are discovered. 
Different simulation runs will yields slightly different learning processes. 

The learning outcome 

The final outcome of learning is a procedure for standard counting that counts correctly. It consists of 
six rules, corresponding to the six rules in the initial procedural knowledge (see Figure 3). but with the 
conditions revised in such a way as to produce correct performance. The final rules are shown in Figure 7 
(Part 1 and Part 2). The level of generality of the learned counting procedure is the same as the level of 
generality of the constraints. The learned procedure transfers to arbitrarily large sets, 1. e., to sets it has 
not practiced on. 

The outcome of the above learning simulation is in accord with the empirical data from the counting 
domain, as well as with the Conceptual Understanding Hypothesis: HS is able to discover the correct 
procedure for standard counting without being given a description of the procedure, without seeing any 
solved examples, and without being given an exj /anation of the procedure. The procedure is constructed 
incrementally in an effort to avoid violating the counting principles. The Conceptual Understanding 
Hypothesis also ciaims that procedures constructed in this way are flexible when confronted with a 
variation of the relevant task. The next application deals with this phenomenon. 

Adapting a procedure to a change in a familiar task 

Life rarely presents us with totally new tasks. There are always some similarities between a new task 
and some previously mastered task. The Conceptual Understanding Hypothesis claims that if a 
procedure has been construc:ed on the basis of principled knowledge of the task environment, then the 
learner should be able to adapt that procedure to a conceptually equivalent but procedurally different 
task. Hence, in our second application we verify that the counting principles enables HS to adapt its 
procedure for standard counting to two changes in the task. First, we modify the standard counting task 
by requiring that the objects be counted In a predef-.ed order (otderBd counting). Second, we modify the 
standard counting task by requiring that the objects are counted In such a way that a particular object is 
aiisigned a particular number (constrained counting). Empirical research has shown that children can 
readily adapt to these two non-standard counting tasks (Gelrran & Gallistel. 1978; Gelman & Meek, 1983, 
1986; Qelman, Meek. & Meridn. 1988). In both simulations, we first have HS discover the procedure for 
standard counting in the way analyzed in the previous subsection. Then we change the task, and observe 
how HS transfers the old procedure to the new task. 

Transfenring from standard to ordered counting 

In ordered counting, objects are counted in accordance with some predefined ordering. Oidered 
counting differs from standard counting with respect to the selection of objects. Rather than selecting any 
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1. If is any object, is a member of ttie ToCountSet, and no object has yet been selected 
- as ttie first object, then select as the first object. 

(Object x^)(Member X- ToCountSet)(Not (Object X J(Flrst XJ) 
="> PlckFlrst(X^) ^ 

2. If xy is the cun-ent object, x, has been associated wth some number n,, x^ is any other 
object, x^ is a member of the ToCountSet, x^ is not the first object, and it is neither true that 
there Is a current number n^, nor that x^ has been associated with some number then 
switch to a;^ as the current object. 

(Object X^KCurrent X^KAssoclate X^ N)(Object X2)(Member X. ToCountSet) 
(Not (First Xg)) (Not (Current N2)(AS80Clate N3)) ===> PlckNext(X^, Xg) 

3. If is any number, there is no object x, such that been associated with x,, some object 
x^ has been selected as the curent object, and there is no number such that n. is the 
successor to then begin counting with n^. 

(Number N^)(Not (Object X^)(A8Soclate X^ N.))(Ob]ect X2)(Current x.) 
(Not (Number N2)(Next N^)) !nltlall2e(N^) 

4. If is the current number, is any other number, is the predecessor of Is 
associated with some object x, x Is not the cun-ent object, and is the predecessor of n^, 
then switch to as the cun^nt number.^^ 

(Number N^)(Current N^)(Number NgKNext N3 NgKAssoclate X N2)(Not (Current X)) 
(Not (Equal N2 N^))(Next N^ Ng) ===> lncrement(N^, Ng) 

5. If n is the cun-ent number, and x, is the cun-ent object, and n has not been associated with 
any other object then Associate n with x,. 

(Number N)(Current N)(Object X^)(Current x^)(Not (Associate Xj N)) 
Assoclate(X^, N) 

6. If is the cun-ent number, and has been associated w»':^ some object x,, and there is no 
object X2 in the ToCountSet that has not been associated wtth some number n^. then assert 
that is the answer. 

(Number N^)(Current N^)(Ob]ect X^)(Assoclate N^ x^) 

(Not (Object XgKMember Xg ToCountSet)(Not (Number N^)(Assoclate X^ N J)) 

===> Assert(N^) ' ^ 



Figure 7: Final mies discovered by HS for standard counting. 



The formulation of ttils rule te opaque, because It Introduces two symbols, and for the same numbef. The two conditions 
that claim that and are predecessors to constitute an ImplfcH equalKy-tesl that binds together the expressions In the rule 
condition. If the program knew the meaning or ^te predecessor relation. It could, in principle, transform the rule Into a less opaque 
form. Hov/ever. the rule as stated here Is the fc^mi that was actually learned In the particular learning jxpedment we are reporting. 
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F. Ordering Constraints 

1 . Objects are considered from left to right. 

(Current X^KAfter X2)( Adjacent X^ X3)(LeftOf X3 x^) " (Equal X^ X3) 

2, Objects are associated with numbers in order from left to right. 

(Current Xi)(Obiect X^KObiect X2(Adiacent Xj X^KLeftOf Xj X^) " (Associate N) 

Figure 8: Constraints that define ordered counting. 

object, the system now has to ceiect one according to certain criteria. The ordered counting tasl< was 
defined for HS by extending the inputs to the program in two ways. Fiist, we extended the initial 
knowledge state by adding left of an6 adyacenf relations between the objects, thereby Imposing an order 
on thb set of objects to be counted. Second, we extended the principled knowledge of the model. In 
unordere^J counting, the act of counting imposes a linear ordering a set of objects that does not have 
c.n intrinsic order, in ordered counting, however, the set of to-be-counted objects has an ordering given to 
h by the netting, and the task is to traverse that order, in this application HS was required to count 
objects in order from left to right. Two new constraints express this idea. The first order constraint says 
that If the current object Is conslderod after another object, then It should be Immediately to the left of that 
object. I. e.. objects shouki be cor>sidered in order from right to left. The second order constraint says that 
obj3cis sh'^rid b3 assigiv' numbers according to the given order, which in this case means from left to 
right. The state conslr>ilnt representation of tj-iose two ideas is shown in Figure 8. 

In thir; simulailcn experiment HS first K^afdwl the procedure for .«?»andard counting in the way 
.^escribed \u the previous subsecton. We then posed the .ask of counting the objects In order from left to 
righ. and run the system again on new task. Some of the rules HS learned for standari counting task 
ar6 cor-<»ct for ordered counting •(.,, The rules for iniJIa'Izing coi.-nting at unity, for incrementing the 
coun'.;ng number, for associating a number wHh an object, cr.d for asserting a nur. as the answer are 
all co,.act for the ordered counting task. But the two rui^s for selecting a first object and for selecting a 
next object produce constraint violations, and are revised to fit the new task. 

For instance, rule 2. the rule selects tt.e next object, has no conditions that constrain it to select 
objects in order from left to right. Rgure 9 shows the search through the rule space for rule 2 in this 
application. The top part of the figure, before the box labelled "Adaptation to ordered counting", shows 
the inttial construction of nile 2 and is identical with Figure 6. The leaming step inside the box is caused 
by the rule violating oidering constraint F1 (see Figure 8). 

Two new rules are created in th-s leaming step. The rule created by Revision 2, shown at the bottom 
and to the right in Figure 9. is the correct rule for selecting the next object in the ordered counting task. 
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04: 

^ (After X2 XI) 
(Not (Equal X2 XI)) 



(Not(£qualX2X1)) 



06: 

_ (Current X2)** 
(Member X2 ToCountSet) 



V 

(Member X2 ToCountStt) 



(Not a 



03: 

_ (First X2)** 
(Not (After X2 XI)) 



rstX2)) 



El: 

(Current N1) 
(Current X2) 
(Associate X2 N2)** 
(Equal N1 N2) 



(NQt(CurnntN3) 

(Associate X2 N2)) 




'f\: 

(Current X2) 
(After X2 XI) 
(Not (Equal X2 XI)) 
(Adjacent X2X3) 
(Left0fX3X2)** 
(Equal X2X3)) 



(Current N3) 
(Associate X2N2) 
(Equal N3N2) 



Adapting to Ordered 
Counting 




(Not (Adjacent X2X3) 
(LeftOfX3X2)) 



FINAL RULE FOR STANOARO COUNTING: 



((Object XI) 
(CurrentXI) 
(Object X2) 
(Associate XI N1) 
(Not (Equal X2 XI)) 
(M'2mberX2 ToCountSet) 
(Not (First X2)) 
(Not (Current N3) 

(Associate X2 N2)) 
(Not (Adjacent X2X3) 

(LeftOt X3 X2))) M m . 



>PickNext(Xl,X2) 



(Adjacent X2X3) 
(LeftOfX3X2) 
(Equal XI X3))) 

FINAL RULE FOR ORDERED COUNTING: 

((Object XI) 
(CurrentXI) 
(Object X2) 
(Associate XI N1) 
(Not (Equal X2 XI)) 
(Member X2 ToCountSet) 
(Not (First X2)) 
(Not (Current N3) 

(Associate X2N2)) 
(Adjacent X2X3) 
(LeftOf X3 X2) 

(Equal XI X3)) » « « > Pick Next (X1,X2) 
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The ruie created in Revision 1. shown at the bottom and to the left in Figure 9. is a modification of the rule 
for standard counting. The performance of this ruie will depend on the perceptual encoding of the 
problem situation, if the initial knowledge state encodes the objects to be counted as unordered, this ruie 
will function correctly. Hence, the outcome of this learning step is a procedure that can solve both task 
correctly. However, if the initial state contains Infomiation about the ordering relations of the objects to be 
counted, then this rule will refuse to fire. This amounts to a prediction that having adapted to ordered 
counting, the ieamer cannot perfomi unordered counting if he/she pays attention to the ordering relations 
between the objects. After this adaptation the system will always count according to the ordering 
relations between the objects, if those are encoded In the initial state. 

Without principled knowledge about the task-without a representation of the task that Is more abstract 
than the rules themceives-there Is no way of knowing which rules are still relevant and which are not 
when the task Is changed. Therefore, an empirical leaming system would have to construct a new 
procedure from scratch for the new variant of the task. HS. on the other hand, knows that a rule needs to 
be revised only if it produces constraint violations, but not othewise. Hence, it can back up the minimal 
distance In the procedure space that is needed to transfer its current procedure to the new task. The 
construction of the procedure for standard counting required twenty-two leaming steps, but the adaptation 
to the ordered counting task only requires two leaming steps. HS shows considerable transfer from one 
task to the other. 

The ability of HS to adapt to a change In the task does not depend on the particular characteristics of 
the switch from unordered to ordered counting. For instance, it does not depend on the fact that this 
swttch involves the addition of constraints. In a different leaming experiment HS learned to adapt In the 
opposite direction. In this experiment HS began by constructing the procedure for ordered counting. We 
then switched the task to standard counting. Figure 10 shows the path through the rule space for rule 2 
in this leaming experiment. The initial construction of the correct rule for ordered counting is shown along 
the right branch of the figure. It consists of three leaming steps, caused by the violation of constraints D3. 
F1. and A2, in that sequence. The final, correct, rule for ordered counting Is shown to the right in the 
figure. 

As the figure shows, leaming step 2 produces a pair of rules, only one of which Is the correct ruie for 
ordered counting. When HS Is confronted wtth the standard counting task, the system backs up In the 
ruie space to this point, and fires the other ruie produced In leaming step 2. This rule, a supposedly 
"Incorrect- mie generated during the leaming of ordered counting. Is developed Into the correct ruie for 
unordered counting In three further leaming steps, shown inside the box labelled "Adaptation to 
unordered counting" in Figure 10. H«jv3e. th? final result is again a procedure that can do both standard 
counting and ordered counting co.Tactiy. 

The third leaming step Inside the box {labelled step 6 r the figure) produces two rules, one of which Is 
the final rule for standard counting. The other ruie Is yet another example of a rule created during 
leaming that is not part of the correct procedure. It does not fire in either standard or ordered counting. 
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INITIAL RULE: 



((Object XI) 
(CurrentObjectXl) 

(Object ^(2)) . . . > P(ckNext(XUX2) 



(Nxjt {First X2)) 



(Not (Not(Equ*lX2X1)) 
(AdjscentX2X3) 
(UftOfX3X2)) 




Adapting to 

Unordered 

Counting 



(Associate XI N1) 



A2: 

(Current X2) 
. (After X2Xr;** 
(Associate XI N1)] 



06: 

«(CurrentObiectX2)** 
(Member X2 ToCountSet) 



V 

(Member X2 ToCountSet) 




(Not (Current N2) 

(AssodtteeX2N3))) 



El: 

(Current N2) 
(Current X2) 
(Associate X2 Na,^* 
(Equal N2N3) 



(Current N2) 
(Associate X2 N3) 
(Squat N2N3)) 



03: 

_ (First X2)** 
(Not (After X2 XI)) 



F1: 

(Current X2) 
(After X2 XI) 
(Not (Equal X2 XI)) 
(Adjacent X2 X3) 
(LeftOfX3X2)" 
(Equal XI X3) 



(N6t(EquatX2X1)) 
(Adjacent X2X3) 
(LeftOfX3X2) 
(Bqual^l X3)) 

A2: 

(Current X2) 

(After X2X1)** 

(Associate XI N1) 

(Associate XI N1) 



FINAL RULE FOR OROEREO COUNTING: 



((Object XI) 
(CurrentObjectXl) 
(Object X2) 
(Not (First X2)) 
(Not(EqualX2X1)) 
(Adjacent X2X3) 
(LeftOf X3X2) 
(Equal XI X3)) 
(Associate XI N1) » 



'>PickNext(Xl.X2) 
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FINAL RULE FOR STANOARO COUNTING: 



((Object XI) 

(CurrentObjectXl) 

(Object X2) 

(Not (First X2)) 

(Not (Not (Equal X2 XI)) 
(Adjacent X2X3) 
(LeKOfX3X2)) 

(Associate XI Nl) 

(MemberX2 ToCountSet) 

(Not (CurrentTagN2) 
(Associate X2 N3))) m 



. >PickNext(Xl.X2) 



Figure 10: Revisions of mie 2 (see Figure 7) in adaption to unordered counting. 
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but could conceivably fire in some other, yet-to-be-invented task. 

The amount of learning required to adapt from ordered to standard counting is not the same as the 
amount of learning required to adapt In the other direction. The switch from standard to ordered counting 
only required two teaming steps, one step for each of ruie 1 and rule 2, while the switch in the opposite 
direction requires a total of five teaming steps, three for rule 2 (shown in Figure 10) and two for rule 1 (not 
shown). HS predicts that transfer of training between pairs of tasks is asymmetric. 

Transferring from standard to constrained counting 

In the task of constrained counting the leamer counts an unordered set, but is required to choose 
objects in such a way that a designated object becomes associated with a designated number. For 
instance, the leamer might be instructed to count in such a way that, say, third object from the left 
becomes associated with, say, the number five. We present this task to HS Ly adding the constraints 
shown in Figure 11. The first constraint represents ttie general Idea that the ddsignated object is 
associated with the designated number. The two 'oliowing constraints express the special case of this 
idea for the Initial object and the first number. 



F. Designation Constraints 

1. Associate the designated object with the designated number. 

(Current Xi)(Deslgnated X^)(De3lgnated H^){Mer X2)(Assoclate X, N,) " (Next N, 

2. Choose the designated object as the first object only if the designated number is the first 
number In the number line. 

(Current X)(Deslgnated X)(Flrst X)(DeslgnQted N) " (Origin N) 

3. When the designated number is the first number in the number line, and the cun-ent object 
is the first object counted, then it should be the designated object. 

(Current X^)(Fir8t X^)(Deslgnated X2)(Deslgnat6d N^)(Orlgm N^) " (Equal X^ Xj) 



Figure 1 1 : Constrs,"nts that define constrained counting. 



As in the previous simulation experiment Hi!: first learned the procedure for standard counting. We 
then changed the task to constrained counting, and run the system again. Figure 12 shows the effect on 
the rule for selecting the next object. At the top of the figure is the final rule for unordered counting. As 
we see the rule violated the first constr.'int in Figure 1 1. which leads to the construction of two new rutes. 
In this case, both of the new rules are relevant for the task of constrained counting. There is considerable 
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transfer from one task to the other in this case also, because HS knows, as it were, which rules to revise. 
As Figure 12 shows. It only requires one learning step to adapt rule 2 to the constrained counting task. It 
required a total of three learning steps to adapt to constrained counting. 

The two demonstrations in this section show that HS can do what Gelman and cc workers have 
shown that children can do: Adapt a counting procedure to a change in the task demands, rather than 
having to construct a new procedure from scratch. The pedagogical hope expressed in the Conceptual 
Understanding Hypothesis is that since children can learn to count with understanding, they might also be 
able to learn to carry out the symbolic algorithms for arithmetic with understanding. The next question is 
therefore whether the HS learning mechanism can produce intelligent learning in the domain of symbolic 
algorithms. This is the topic of the next application. 

« 

Correcting errors In a symbolic algorlthrh 

The Conceptual Understanding Hypothesis claims that a leamer who constructs a procedure on the 
basis of principled knowledge is able to spontaneoysly correct nonsensical errors, wHhout being told what 
the correct mie is by an outside source, and without having access to a correctly solved example. If the 
teaming of symbolic algorithms such as the subtraction algorithm can proceed in an insightful fashion, the 
leamer should be able to recover from the standard subtraction bugs observed in children's perfomiance. 
In our thiril application we show that the HS system can correct errors in a procedure for multi-column 
subtraction on the basis of knowledge of the principles of subtraction. 

In this application HS operates in the standard problem space for subtraction (Ohisson & Langley. 
1985. 1988).22 A subtraction problem is described in terms of the values (v,. ...) cf the digits in the 
problem, columns (rf,. ... ). and rows (r,, ...). The columns are numbered from right to left. The initial 
state contains a description of the spatial layout of the rows and columns, the particular digits of the 
current problem, a portion of the number line, and the relevant number facts. 

There are eleven operator in this problem space: Select a column, move to the next column. * 
decrement a digit, increment a digit, recall the difference between two single digit numbere. recall that the 
difference between two equal numbers is zero, mari< a column as the column to increment, mark a 
column as the column to deaement. move attention to the left, move attention to tna right, and mile a 
digit. The operators for the standard subtraction space is shown in Figure 13 (Part 1 and Part 2). The 
correct procedure for subtraction with regrouping consists of eleven mies that fire those operations. Tho 
state constraints for subtraction that we have developed are :nspired by Resnick (1984) and by Resnick 
and Omanson (1987). We will state each rule and constraint as we analyze each example of how HS 
leams in this domain. A more detailed description of the subtraction model has been given in a previous 
report (Ohisson & Rees. 1987). 

Learning experiments in the subtraction domain are not carried out by having HS learn subtraction 
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FINAL RULE FOR STANDARD COUNTING: 

((Object XI) 
(Current XI) 
(Object X2) 
(Associate XI Nl) 
(Not (Equal X2 XI)) 
(Member X2 ToCountSet) 
(Not (First X2)) 
(Not (Current N3) 

(Associate X2 N2))) « =» a > PickNext(X1, X2) 



(Not (Designated X2) 
(Designated N4)) 



G1: 




(Designated X2) 
(Designated N4) 
(Current X2) 
-(After X2X1) 
(Associate XI Nl)' 
(NextNl N4) 




(Designated X2) 
(Designated N4) 
(NextNl N3) 



FINAL RULE FOR CONSTRAINED COUNTING: 
(Case 1 • Designated Number Occurs Next) 

(Object XI) 
(Current XI) 
(Object X2) 
(Associate XI Nl) 
(Not (Equal X2 XI)) 
(Member X2 ToCountSet) 
(Not (First X2) 
(Not (Current N3) 

(Associate X2 N2)) 
(Not (Designated X2) 

(Designated N4)) « » « > PicVNext(X1, X?) 



F.'NAL RULE FOR CONSTRAINED COUNTING: 
(Case 2: Designated Number Does Not Occur Next) 

(Object XI) 
(CurrentXI) 
(Object X2) 
(Associate XI Nl) 
(Not (Equal X2 XI)) 
(M8mb€rX2 ToCountSet) 
(Not (First X2)) 
(Not (Current N3) 

(Associate X2 N2)) 
(Designated X2) 
(Designated N4) 

(NextNl N4) » « « > PickNext(X1,X2) 



Figure 12: Revisions of mie 2 (see Figure 4) in adaptation to constrained counting. 
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FlrstCo!umn(C) Takes a column as input and declares that column as the first column. 
The addition list is {(Processing C)}. 
The deletion list is empty. 

NextColumn(C^, Cg) 

Takes two columns as inputs, and moves attention from one to the other. 
The addition list is {(Processing Cj)}. 
The deletion list is {(Processing C^)}. 
Decrement(R, C^, Cg, V) 

Takes as input the position that Is Iwing decremented during a regrouping operation, 
the position that Is Iwing Incremented, writes the new value for the decremented digit' 
and records that the decrement has occured. 

The addition list is {{BorrowedFrom for C2)(CrossedOut R C.){R C. Value V)}. 
The deletion list is {(BorrowlngFrom For C2)}. 

lncrement(R, C, V) Takes as input the position that is being incremented during a regrouping operation, 
writes the new value for the incremented digit, ?.nd records that the increment has 
occured. 

The addition list is {(Regrouped C)(CrossedOut R C)(R C Value V)}]. 
The deletion list is {(Regrouping C)}. 

RecallDJff(V^, Vg, C) 

Takes two numbers and a column as inputs, recalls the difference between the 
values, and writes the result in the answer-row of the column. 
The addition list is {(AnsRow C Value V3)}, where V3 « V^ - Vg. 
The deletion list is empty. 



Figure 13: Operators for subtraction. Part 1. 

from SCTatch. Instead, we take the correct subtraction procedure and inflict errors of various kinds on it, 
run HS witti ttie erroneous procedure, and observe whettier HS can correct the error or not. We have 
verified tfiat HS can correct the most frequent errors ttiat have been identified empirically in children's 
performance. We will illusfrate this capability with (a) ttie smaller-from-larger bug, because it Is the 
..lost frequent of all bugs, (b) a borrowing bug, because borrowing bugs are ttie conceptually most difficult 
bugs, and (c) an ordering bug, because it provides a contrast to the ottier bugs. More extended examples 
of learning in the subtraction domain can be found in Ohisson and Rees (1987). 

Recovering from the smailer-from-urger bug 
Consider the following faulty rule for subfraction: 

// Is the current column, is in column c^ v, is in row r^ Is In 
column c^ is in row r^ and is smaller than v,, then 

RBCALLDIFF(V^, V^, C^). 

The operator recalldiff creates an expression that encodes the retrieved difference, call it Vg, as the 
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SameDlff(C) Takes a column as input and writes zero in the answer-row for that column. 
The addition list is {(AnswRow C Value 0)}. 
The deletion list is empty. 

MarkColumn(C) Takes a column as input, marks it as the column needing to be regrouped, 
ilie addition list Is {(Regrouping C)}. 
The deletion list is empty. 

FlnclCo!umn(C^ Cg) 

Takes the column to be regrouped and a second column as inputs, and marks the 

first as the column to be regrouped. 

The addition list is {(BorrowlngFrom For Cj)}. 

The deletion list is empty. 

ShlftLeft(C^ C2C3)Takes three columns as inputs, and designates and C2 as the columns to be 
decremented and incremented, respectively. 
The addition list is {(Regrouping C^)(BorrowlngFrom C3 For C^)}. 
The deletion list is {(Regrouping C2)(BorrowlngFrom For Cj)}. 

ShlftRlght(C^ C2) Takes two columns as inputs, and designates the second one as the one to be 
incremented. 

The addition list {(Regrouping Cj)}. 

The deletion list is {(BorrowlngFrom For C2)}. 

WrtteValue(C, R, V) 

Takes a position, given by a column C and a row R, and a value V as inputs, and 
writes N in the given position. 
The addition list is {(R C Value V)}. 
The deletion list is empty. 



Figure 13: Operators for subtraction, Part 2. 
result for column c^^, f e., O3 contains the single expression is the result In column recalloiff does 
not delete any expressions, I. e., is empty. This rule ignores the distinction between tho minuend and 
the subtrahend, thus causing the so-called smaller/fromoj^rger bug (Brown & Burton, 1978). 

The principle that is violated by the above rule consists of two ideas. First, the purpose of subtraction 
is to take the subtrahend from the minuend. Second, in the arithmetic of whole numbers, subtraction is 
undefined when then the minuend is smaller than the subtrahend. The constraint givan to HS is: 

If row is the subtrahend row, row r^-^ is above r^y^, v^-^ is in 

<^x' ^mm ^mirf ^sub v^^^^ is in row r,^ v^f„ is smaller 

than v^^^ then not(the result in column c^ is v^. 

If the minuend in a particular column is smaller than the subtrahend, then there should be no result in that 
column. It should be noted that the satisfaction pattern is enclosed in a "not" meaning that the constraint 
is satisfied when the pattern does not match. Also, once the column has been regrouped, the new 
minuend will not be smaller and this constraint will cease to be relevant. 
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When applied to the right-most column In. for example, the problem 505 -19^?, the rule condition R 
becomes Instantiated to 

is the current column, 9 is in column c,, 9 is in row r^^^ 5 is in 
column Cy, 5 is in row r^^^ and 5 is smaller than 9, 

and the addition list O3 becomes (4 is the result in column c^h The relevance pattern of the constraint 
becomes instantiated to 

^sub subtrahend row, row r^^^ is above r^^^ 5 is in Cp 5 
is in row r^^^ 9 is in Cp 9 is In row r^^^ 5 is less than 9, 
and the satisfaction pattern becomes 

4 Is the result in column Cy 

Since having any result in this column violates the constraint, HS tries to learn from the vfolatfon. 
Obviously, this rule should never fire when the subtrahend is greater then the minuend. To put it another 
way: If this pjle fires wlien the constraint is relevant, the constraint is guaranteed to be violated. Thus, 
the rule should fire only when the constraint is not relevant. The learning mechanism does attempt to 
create two revisions to the aile, but It is successful in only one case. 

Revision 1. Ensuring that (he constraint Is not relevant. First. HS computes (C^ - OJ. using the 
Instantkitlons of these patfems. However, recaudiff adds only the single expression that matches the 
satisfaction pattern, so the result is C^. Next HS removes any parts that are already part of the rule 
pattern. The result is a single expression which is part of C^, but not part of either O^, or R: is above r^. 
HS replaces the constants r^ and with the appropriate variables, and creates a new rule by adding the 
negation of tills expression to tfie condition of tfie faulty rub: 

nof(ryis above r^^). 

This correction cures HS from tfie smaller/from/urger bug. 

Revision 2. Ensuring that the constraint is satisfied. HS computes (C^ - O J. However the result is 
empty; RECAaoiFF adds the single expression that matches tfie satisfaction pattem. The learning 
mechanism stops at this point and does not attempt to create a second rule. 

Recovering from a bcrrcwtng puq 

The following inconrect subtraction rule finds a column to bon'ow from when regrouping is needed: 
// c^ is the column to be regrouped, Cy is a column, v^ is in column 
Cy, v^ is in row r^^ row r^^ is the subtrahend row, and row r^;^ is 
above row then findcolumn(c^ cJ. 

The rule says that if a particular column needc to be regrouped and there is a second column that 
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contains a minuend value, then mark the second column to 6e bomswed from in order to regroup the first 
column. FINPCOLUMN adds a single expression representing the fact that Cy is to be borrowed from to 
regroup c^. it does not make any deletions. This rule will choose any column to borrow from. If, for 
instance, a particular problem contains three columns, this rule will match three times, once for each 
column (including the column that is supposed to be regrouped). This rule produces several paths which 
result in different subtraction bugs. For instance, if the column to the left of the column to be regrouped 
contains a zero in the minuend, one of the paths will produce the well known borrow-across-zero bug 
(Brown & Burton, 1978). This error is produced because this rule attempts to initiate borrowing from all 
columns. It does not detact the zero and deliberately skip it. Other paths produced by this faulty rule 
generates other, not necessarily observed, subtraction bugs. 

it is possible to apply principled knowledge to this rule to produce a correct rule. The relevant 
principle states that the column that is borrowed from during regrouping should be just to the left of the 
column that is being regrouped.23 This principle is expressed in the following state constraint: 

Ifc^ is the column to be regrouped and Cy is the column to bonow 
from then Cyis to the left ofc^ 

There are two differences that should be noted between this constraint and the previous one. First, 
the satisfaction pattern is not enclosed inside a' "not" Thus, the constraint Is satisfied when the 
satisfaction pattern matches rather than when ft does not match. Second, because the rule will fire only 
when there is a column to regroup and because ths operator afways adds a column to borrow from, this 
rule Is guaranteed to make the constraint relevant. Thus, the task of learning is to ensure that it will fire 
only when it will also make the constraint satisfied. 

Revision 1. Attempt to ensure that the constraint is not relevant. The difference between the 
operator's addition and the relevance pattern (C, - Og) is: c^ is the column to be regrouped. This clause is 
already part of the rule pattern, however, and adding the negation of it to the rule would produce a new 
rule that cannot possibly match. Thus, this branch of learning ceases without producing a new rule. 

Revision 2. Ensure that the constraint is satisfied. Because tfie satisfaction pattern and the 
operator's addition do not overiap, (Cg - Og) is just C^. The attempt to compute Revision 1 showed that 
there Is nothing from the relevance pattern to add because (C^ - Og) is already present in tfie rule pattern. 
The revision is to add Cy is to /.'js loft of c^ to the rule, which produces a correct rule. 



"Ws particular HS model of subtraction explicitly increments and decrements columns that have zeroes in the top row. In the 
afcjonthm taught in schools this process Is sometimes abbfoviated to cross out ths zero stnd write a nine, ttien decrement the next 
column to the left. 
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Recovering from an ordering buo 

New rule upon learning, new rules often appear in pairs. One rule of the pafr will fire when the 
particular constraint will not become relevant and the other rule will ffre when the constraint will become 
relevant and satisfied. In the previous two examples, only one of the two revisions succeeded, so only 
one new rule was created in each teaming step. In this final example of error correction In subtraction two 
new rules are produced. 

The relevant rule deddes which column to start with in a subtraction problem. It will choose any 
column, not just the rightmost, i. e., units, column: 

If there is no current column and c^ is a column then 

FIRSrCOLUMN(cJ 

RRSTCOLUMN adds (c^ is the current column) to wori<ing memory and does not delete anything. Uke the 
faulty borrowing rule discussed above, this rule produces branching in the search space. Various odd 
results are possible along the various branches. For instance, if the rule for choosing the next column to 
work on correctly chooses the next column to the left, then it might happen that one or mere columns to 
the right are never processed. If the rule for selecting the next column is faulty as well, then columns may 
be processed in any arbitrary order. 

The principle that is violated is that columns should be processed in right to left order. The 
corresponding state constraint says that if a column is being processed and it Is to the left of another 
column, there should be an answer in that column: 

If c^ is the current column and c^ is to the left of c„ then v is the 
result in Cy, 

This constraint is sufficient to catch both en-ors in choosing the first column and errors in choosing the 
next column. 

Revision 1. Ens(^,rlng that the constraint is not relevant firstcolumn adds the cun-ent column so (C^ - 
Og) is the second dause in C^, c^ is to the left of c^ Adding the negation of this expression to the rule 
produces the obvious requirement that the first column can not be to the left of any other column. This is, 
of course, the correct rule. 

Revision Z Ensuring that the constraint Is satisfied. The satisfaction pattern and the addition do not 
overiap so (C, - is just C,: v^ is the ,esult in cy Adding the expression computed for Revision 1 and 
this expression produces the following rule: 

If there is no current column, c^ is a column, c^ is to the left of Cy, 
and v^ is the result in Cy then FiRsrcoLUMN(cJ 
In the particular representation of subtraction we have chosen for this application, once processing has 
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started ttiere is always a current column. Thus, it is not possible for there to bB no current column and a 
column with an answer in it at the same time, which means that this rule will never match. Because 
reasoning about the representation is required to discover that this rule will not match, this conclusion is 
beyond the power of the learning mechanism, so this rule is added to the rule sat. This addition is 
useless but harmless. 

The above examples are simplified in several respects, (a) We usually fjive HS several deficient initial 
rules, and we inflect more severe deficiencies on them, so the system starts o with a mixture of different 
bugs, rather than with a single bug. (b) A severely deficient rule usually violates several constraints, and 
so has to be revised repeatedly, (c) In order to make the computation of the patterns to be added to 
faulty rules easier to follow, we have not shown any operators that perform deletions from working 
memory, (d) For the same reason, we have not shown any constraints that include negated subpattems. 
The subtraction model that these examples of error corrections are taken from has been discussed in 
more detail in Ohisson & Rees (1987). 

Discussion 

The behavior of the HS system has several interesting features. First, HS necessarily learns while 
doing. Only by executing its procedure can the system discover that it generates invalid search states. 
The principled and the procedural knowledge only communicate through the representation fo. a 
panicuiar problem situation. Unless the procedure Is applied to some problem, there is no way that HS 
can discover inconsistencies between its procedural and its declarative knowledge. The design of the 
system is such that HS, like humans, must act in order to learn. 

Second. HS is not dependent upon external feedback. It uses its principled knowledge to monitor its 
own performance, and to discover en-ors along the path to an answer, it catches itself in mid-air, as it 
were, learns, and starts over on the cun-ent task before it reaches an answer. This type of behavior is 
frequently observed in human learners, but difficult to explain with experience-based learning 
mechanisms. 

Third, HS learns gradually. Rules have to be rev'.sed repeatedly. The fact that a rule has been cured 
from violating one constraint does not guarantee that it will not violate some other constraint. Successive 
transformations are needed to construct a correci rule even fof such a simple task as counting, as the 
examples above shov/. Since the learning mechanism work^i by revising existing rules, the output from 
one learning step is the Input to the next learning step. For HS as for humans, the construction of a new 
procedure is necessarily a gradual process. 

Fourth, the learning mechanism of HS revises a rule by splitting it into two different rules, each version 
constrained in a different way with respect to the original rule. In most situations only one of those 
versions is correct from the point of view of the target procedure, and the other other version is a so- 
called monster rule. i. e., a syntactically con-ect and executable rule that is not part of the procedure to be 
learned. In some cases the monster rule can be weeded out on the basis of syntactic criteria, but in many 
cases it is impossible to decide whether a rule is fruitful or not by inspecting thd rule. In those cases both 
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versions of the rule are executed in future trials, and HS gets rid of the monster rule by further learning. 
The monster rules are executed and constrained repeatedly, until they are so constrained that they 
cannot match any search state. They are then hamiless and have, functionally speaking, been deletedS-* 
If we think of HS' learning as a search through the procedure space, we can describe this phenomenon 
by saying that HS does not have a criterion for when it has reached the goal state, i. e., a con-ect 
procedure. Therefore, it has to continue searching in order to verify that there are no further 
iiTiprovements to make. 

The fact that HS weeds out monster rules by further learning constitutes a prediction that human 
learners will continue to make mistakes even after they have acquired the correct rules for a procedure. 
The reason is that they have not yet teamed to ignore the altemative, incorrect rules that were 
constructed in the same learning step as the correct rule. Further practice is necessary in order to get rid 
of those rules. Hence, HS predicts that practice will be beneficial for some period of time over and above 
what is needed In order to reach correct perfomiance. This point illustrates well the complex interactions 
between knowledge and practice. It also illustrates the necessity of implementing and running information 
processing models. The result that further practice is necessary even after the correct rule has been 
constructed is a rather complicated, and unanticipated, prediction from our theory that we almost certainly 
would not have discovered without computer implementation of t;:e theory. 

Fifth, HS can transfer a procedure from one task to another. The flexibility of HS' procedure for 
counting does not reside in the final procedure that HS learns. The set of final production rules learned by 
HS is, taken by itself, as brittle a procedure as any other. It is only when those rules are execute-* in the 
context of the state constraints that flexibility is achieved. The flexibility of HS does not reside in the type 
of procedure it learns, or in the problem solving method embodied in that procedure, but in the fact that 
the procedure is executed within a cognitive context that includes principled knowledge of the task 
environment. 

Sixth, HS finds It easier to transfer between tasks in one direction than in the other. For instance, the 
teaming process the* transfomis a procedure for unordered counting into a procedure for ordered 
counting is not the same as the process tha- transfomip a procedure for ordered counting into a 
procedure for unordered counting. Depending upon which constraints are violated, the number of 
ksaming steps involved In adapting from one task to another may be different fron. the number of learning 
steps required to adapt In the opposite direction. This constitutes a prediction that transfer of training 
between pairs of tasks is asymr^tric. 

Seventh, teaming in HS consists of a transition from a knowledge-based to a procedure-based 
perfomiance. In the initiar phase of learning, the system nrjakes much use of its principled knowledge, 
because the grossly incomplete procedure makes en-ors at every step. As the procedure Is gradually 



^^Ve couH model actual detetion of such rules by assigning weights to rules, and postulating (a) that the weight decays over time 
unless the rule is fired, and (b) thai rules with a weight below some threshold value is purged from the system. We have not 
implemented such a mechanism in the cun-ent version of the HS model. 
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completed, fewer and fewer of the steps are incorrect, and the state constraints kick Into action less and 
less. At the end of learning. th6 state constraints have dropped out of sight completely, because the 
production rules now generate only correct solutions, if we assume that the state constraints have levels 
of activation and that the activation level Is a function of how often the constraint is violated, then HS 
models the transition from mindful action, in which all steps are thought about in relation to the system's 
principled knowledge, to routine action, in which an already mastered procedure is simply mn off as it 
were, without much thought. The principled knowledge of HS only plays a role in its pertomiance when 
sometfiing goes wrong, i. e.. some inconsistency between the current state of the world and its knowledge 
Is detected, in short. HS only thinks, as it were, about the current problem when it is forced to do so by 
some difficulty. 

Eight, adaptation to a new task involves revision of those mies that are not appropriate for the new 
iask. Rules that are Inappropriate will be revised, because they will violate some constraint for the new 
task. Rules that are appropriate for the new ^ask will not be revised, since they do not cause any 
constraint violations. Hence, by constmction. HS knows wtiich parts of a procedure to retain and which to 
revise when faced wHh a change in ttje task demands. Uke humans. HS can build on what it has 
previously learned when learning a new procedure. 
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Relations to Previous Research 

The purpose of this section Is to outline the major conceptual differences between the HS nwlel and 
other computational models of the acquisition of arithmetic procedures. To the best of our knowictdge. 
there a .; only two previous analyses of the problem o' deriving arithmetic procedures from principled 
knowledge, both of which make use of so-called planning nets, but neither of which resulted in an 
implemented simulation modal (VanLehn & Brown, 1980; Qreeno. Riley, & Qelman, 1984; Smrth, Qreeno. 
& Vrtolo. in press). We also know of two efforts to simulate human procedure acqulsttfon In arithmetic 
which employ experience-based, rather than knowledge-based, learning methods (VanLehn. 1983a. 
1983b, 1985a, 1985b, 1986; Neches, 1981. 1982, 1987; Neches & Hayes, 1978). 

Planning nel analyses of arithmetic procedures 

VanLehn and Brown (1980) have pol.;led out that a program for a procedure does not reveal the 
purpose of that procedure. Programs anr* 'low diagrams specify the steps of a procedure and the 
conditions under which those steps are to be carried out, but they do not describe the reason why a 
particular step Is included in the procedure, or why K is executed under those conditions. For instance, 
ttie procedure for carrying in multi-column addttfon can be described as follows: when the sum of column 
n Is larger than nine, then detach the units part of the sum, record that part as the result for column n, and 
add the remaining part to the column to the left. But this description does not reveal that the purpose of 
the canying operation is to make sure that each exponent of ten is represented by a single-digrt 
coefficient in the answer. VanLehn and Brown (1980) introduce the \em "telelogical semantics" to refer 
to a description of the purpose of the steps in a procedure. 

Drawing upon A. I. analyses of planning, VanLehn and Brown (1980) proposed a methodology for 
generating a procedure from a goal in such a way that the trace of the generation constitutes a 
teleological semantics for the procedure. Their methodology assumes that planning begins with a goal, a 
sel of operations, a set of planning heuristics, and a characterization of a problem srtuation. Planning 
begins by posing the goal, and proceeds by expanding it, i. e., replacing it with a structure consisting of 
subgoals and/or executable operations. Each step in the process is guided by a planning heuristic. The 
process continues until all goals have been expanded into executable operations, and the execution of 
the procedure does not contradict any features of the problem srtuation. The trace of a planning process 
consists of a fttaph in which the nodes are (partial) procedures, i. e. procedures that contain yet-to-be- 
expanded subgoals. The links between the nodes are labelled wtth the planning heuristic that led from 
one procedure to the next. VanLehn and Brown (1980) call such a trace a planning net. 

VanLehn and Bro-wi (1980) Invented planning nets in order to comparo procedures with respect to 
closeness c similarity. They found that program-level representations of procedures do not yield 
reasonable similarity metrics: A minor conceptual change in a procedure can give raise to huge 
differences in the program for that procedure. They propose a closeness metric based on the planning 
net representation that does reproduce inturtive judgments about similarity between procedures. They 
use the metric to discuss the pedagogical merit of concrete models for arithmetic such as Dienes blocks. 
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and to design a sequence of concrete models for instruction in subtraction (VanLehn & Brown. 1980. pp. 
132-136). The planning mechanism is not implemented as a computer program. They do not daim that 
the process of deriving a planning net for an arithmetic procedure correspond to the mental process of 
someone who Is trying to learn that procedure. 

The Idea of deriving a procedure by successively expanding goals into operations within the 
constraints imposed by a particular problem situation was taken up by Gieeno and co-workers in their 
theory of counting competence (Qreeno. Riley, and Qelman, 1984; Riley & Qreeno. 1980; Smith & 
Qreeno. 1983; Smith. Qreeno. & Vitolo. in press). The basic assumption of their theory is that knowledge 
of principles Is encoded in action schemata. A schema is an action described at a high level of 
abstraction. The description includes both inputs (prerequisites), success criteria (postrequisites). outputs 
(consequences and effects), and conditions that have to remain true during the execution of the action 
(corequlsites). For instance, the following schema describes the action of picking up an object: 

PICK-UP(a) 

Prerequisitfes: movable{a); 

enpty(Hand). 
Consequences: /n(a. Wand). 

Vaq pick-up schema says that the prerequisites for picking up an object a are that a is movable ana that 
one s hand is empty. The consequence of picking up an object is that the object is in the hand (Qreeno. 
Riley. & Qelman. 1984, p. 105). The PICK-UP schema is an example of a schema that can be esscuted 
without expansion Into other schemata. Executable schemata correspond to what Is called operators in 
most computational models of problem solving. 

Knowledge of the counting principles is encoded in a total of twelve different action schemata, most of 
them considerably more complicated than the PICK-UP schema. A central schema is the description of 
the action of mapping a set onto a subset of another set: 

MATCH(X, Y) 

Prerequisitfes: empty{A)\ 
empty(B}. 

Corequlsities: subset(A, X), where A'{x: tagged{)(i); 

subset(B, Y), where S» {y: usec/(y)};^ 
equaliA, 6). 

Postrequlsitles: For all x, member{x, X) -> member{x, A). 
Consequence: equal(X, 6). 

The MATCH schema says that in order to match a set X to a set Y, we must first have an empty subset of 
each set. We then act on those subsets (in some manner that is not specified In the schema itself) until 
the subset A of X becomes equal to X itself. We cause A to grow, as it were, until it Includes all of 
X. While doing this, we make sure (in some yet-to-be-spedfied way) that it always remains the case that 



"Sfho two propertiM taggodand used servo bookkeeping purposes in the Greeno et. al (1984) analysis of counHng. 
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the subset B of Y has the same number of members as A. i. e, we cause B to grow at the same rate as 
A. The result of acting in this way is that when A includes all of X. X is guaranteed to have the same 
number of elements as B. Since B is a subset of Y, X has thereby betjn mapped onto Y. The MATCH 
schema is part of the encoding of the one-to-one mapping principle (Greano, Riley, u Gelman^ 1984. p. 
113). It is a'^ example oi a non-executable schema; it cannot be executed as it stands, but has to be 
expanded into executable schemata. 

The computational mechanism postulated in the action schema theory is a planning mechanism that 
bears a family resemblance to the type of mechanism sketched by VanLehn and Brown (1980). It takes 
as inputs the goal of deciding the cardinality of a set, the collection of twelve action schemata, and a 
setting. The setting describes the problem and the physical situation in which the problem is to be solved. 
The planning mechanism consists of two components. The first component is a machar.;sm for backward 
chaining \Ua\ matches the goal against the consequences of the action schemata.^e Schemata that can 
satisfy that goal are posed as potential actions in the plan. The prerequisites of those schemata are then 
posed as subgoals. This process continues until ail goals are satisfied eHher by the setting or by the 
consequences of executable schemata that are included in the plan. The second component of the 
planning mechanism is a ffteorem proi^er that decides whether a particular pre-, co-, or postrequisite is 
satisfied in a particular setting by tiying to prove that requisite as a theorem. 

The trace of the planning m( ..anism is a graph that Greeno et. al (1984) call a planning net, with 
reference to the work by VanLehn and Brown (1980). However, there is little formal resemblance 
between the two types of graphs. The planning nets of VanLehn and Brown (1980) have partial 
procedures as nodes. Unks an labelled with planning heuristics. The label H on the link from node N to 
node M means that applying planning heuristic H to procedure N yields procedure M (see Figure 18.2, 
VanLehn and Brown. 1980, p. 115). In contrast, the planning nets in Greeno et. al (1984) have action 
schemata, tests, and goais as nodes, and the links are labelled as pre-, post, or co-requisities. The 
meaning of, say, the pres^quiste link R from, say, action schema node A to. say, goal node G is that 
obtaLnment of goal Q satisfies prerequisite R for action A (see Figure 4, Greeno et. al, 1984, p. 119). The 
two types cf graphs, although formally different, share the purpose of explaining a procedure by relating 
steps to goals. 

The main phenomenon investigated by Greeo et. al (1984) is the flexibility^^ of childrens' counting 
perfomiances. In particular, the 'act that children can adapt their counting procedures to a variety of 
setUngs. Flexibility of perfomiance is explained in the action schema theory by the fact the planning 
mechanism can derive different procedures for different settings from one and the same set of action 



I reeno, Rlley, & Qelman (1984. p. 1 1 6-1 17) incorrectly descrfco their mechanism as a form of means-ends ana.ysis. However 
means-ends analysis consists of computing a dltfore.-ico between a goal and a situation, and retrieving an operator that can reduce 
that difference from a difference-operator table (Ernst & Newell. 1969). The mechanism described in Greeno ot. al (1984) does not 
compute differences, and does not make use of a difference-operator table. 

"Greeno ot. al (1984. p. 122) make a disHnctten between flexibility and robustness. This distinction Is not necessary for the 
discussion here, so we use the term "flexibility" to cover both concepts. 
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schemata. The planning mechanism does not appoar to have any resources for making uso of the 
procedure for one setting in the derivation of a procedure for another setting; each procedure is derived 
de novo. 

Since both the slate constraint theory proposed here and the action schema theory by Greeno and 
co-wori<ers address the same psychological phenomenon, it ought to be possible to make a detailed 
comparison between them with respect ability to account for data, clarity, simplicity, generality, etc. 
However, such a comparison is complicated by the fact that the action schema theory is not proposed as 
a process theory, but as a competence theory. Qreeno et. al explicitly reject any claims about the 
psychological reality of the planning mechanism that they describe: 

Wo note that we do not necessarily identify the process of derivation of planning nets as a plausible 
psychological hypothesis. As with other hypotheses about competence, we restrict our daim to 
psychological reality to the content of the knowledge that Is attributed to individuals and to the structures 
that .re implied by that knowledge. In our analysis, the rela'ion between competence and performance 
stmctures has the form of derivationr in which the performance structure... are consequences of 
competence structures, derived by a pic .ning system. However, wc .iave not tried to determine the form of 
the dependence between competence and performance stmctures In human cognition. 

(Gre»;w. Riley. & Gelman, 1984, p. 104) 

We consider the content of the competence In our analysis a plausible set of hypotheses ab<HJt children's 
tacit luiowledge, but tlie way in which the three components of competence are used In deriving planning 
nets should be interpreted as a formal relation, not nocessarily corresponding to cognHive mechanlsm-j, 

(Greeno. Riley. & Gelman. 19S4. p. 138) 

in Short, the action schema theory spells out the rational connections between the counting principles, 
encoded as action schemata, and the procedures that generate counting behavior, but the planning 
process that generates those connections does not (necessarily) correspond to any mental process. 

If the computational machinery of the actiu.i schema theory is not to be interpreted as a psychological 
hypothesis, what are the empl-ical claims o! the theory? In what respects can the theory Me comparod to 
a process model such as the HS system? In the two excerpts quoted abova Qreeno et. a5 claim that 
children knew the content of the action schemata. But the action schemata are supposed to encode the 
counting principles, so this cla!n> appears, at first glance, as a mere restatement of the conclusion by 
Gelman and QiJiJstel (197?' .l.,- '.^."'iren know those principles. 

However, inspection of the ac,c< schemata does not support the Idea that they are nothSng but an 
encoding of the counting principle? F or Instance, the MATCH schema (see above) can be pt^^phrased 
as saying that If an Inltfally empty subset A of a set X Is changed so as to Include more and more of X, 
and If an Initially empty subset B of an other set Yls changed so as to always have tho same size as A, 
then, when A has faei-ome Identical to A, B will have the same siie as X. This is a rather complicated 
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set-theoretic theorem that cannot reasonably be said to be induded among the counting principles. The 
claim thai children havo the knowledge encoded in the action schemata is therefore a claim that they 
know the counting prindpies plus the other principles embedded in those schemata. But the authors do 
not sped.V those other principles. 

Identification of which principles are encoded in the action schemata is further complicated by the fart 
that principles are spread out among the schemata, and that children are not hypothesized to either know 
or not know the principles: 



We did not formulate a schema for understanding of order, another schema tor one-to-one 
correspondance, and so on. Instead, it seemed more reasonable to hypothesize schemata that represent 
different aspects ot the various principles, and often indude aspects of different principles. If our analysis is 
accepted, then competence for each of the principles is distributed among several schemata, rather than 
being located in any single structure. This emphasizes that a child should not be considered as either 
having or not having competence regarding any of the principles ... . 

(Greeno, Riley, & Gelman, 1984, p. 137) 

Even if we had a list of the prindpies encoded in the adion schemata, the evaluation of the claim that 
children know those principles would still be problematic. The action schemata are hypothesized to be 
known implicitly (Greeno, Riley, & Gelman, 1984, pp. 106 and 137). Hence, the claim cannot be tested by 
interviewing children directly about the content of the schemata. Knowledge of the schemata must be 
Inferred from observations of performance. But we do not know what to look for in children's performance, 
since the action schema theory does not dair v psychological reality for its process mechanisms. 

However, the actnn schema theory can be interpreted as making a different kind of empirical claim, 
although it is not stated explicily by Green(. et. al (1984). The authors draw an analogy ly^tween their 
wori< and the chomskyan methodology for competence theories in the study of syntax strid 
interoretation of this analogy implies that we can assign a psychological interpretation to the set all 
counting procedures that can be generated from the action schemata with the dsscribed planning 
mechanism. The theory can be intenireted as claiming that the action schemata and the planning 
mechanism generate all counting procedures that competent number users would judge as con-ed.^^ a 
claim that is, in principle, empirically testable, and which can f« used to compf,re the adion schema 
theory to other theories. For axampie, it would be interesting to compare the set of counting procedures 
that can be generated by action schema theory with the set of counting procedures that can be teamed 
by the HS system. Greeno et. al (1984, pp. 137-138) mention the possibility of deriving such a predidion 
from their theory, but they do not develop it, with the motivation that there is no charaderization of the set 
of all possible procedures, analogous to the charaderization of the set of all possible strings of symbols in 
a language. 
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A mir,jor difficulty in the evaluation of the implicit claim that the set of procedures that can be 
generated from action schema theory coincides with the set of correct counting procedures is that the 
planning mechanism postulated in that theory is not fully specified.^s The backward chaining mechanism 
is given an informal specification that appears precise enough to support implementation (Greeno et. al, 
1984, p. 118-1 17). However, it is radically incomplete: Greeno et. al does not deal -with the issue of how 
to order sibling subgoals, one of the central problems for planning mechanisms. Ordering subgoais is 
crucial for the derivation of even the simplest procedure. The authors themselves express doubts as to 
the sufficiency of the computational mechanism they describe (see Greeno et. al, 1984, p. 116, footnote 
7, and p. 122). Furthemrore, the theorem prover that decides whether requisites are satisfiad is not 
described, even in outline, it is supposed to have access to inference rules and general propositions. An 
example of a general proposition is th&S objects in a straight line can be ordered, starting at one end and 
proceeding to the other (Greeno et. al, 1984, p. It 8). The relation betwc-sn general propositions and 
action schemata is r^ot clarified. Without a fully specified computational mechanism, the set of 
procedures that can be derived from the action schemata is not well defined. 

In summary, the planning net analyses of arithmetic procedures by VanLehn and Brown (1980) and 
by Greeno et. al (1984) are based on the notion of constructing a procedure by successively expanding a 
goal into a plan for how to achieve that goal. However, the analysis by VanLehn and Brown (1980) is not 
intended as a psychological theory, but is aimed at the definHion of a similarity metric for procedures. The 
action schemata theory of Greeno et. al (1984) is a competence theory, and the empirical claims of the 
theory are unclear. Neither analysis has been embodied in an implemented system that can generate 
runnable procedures. 

Simulation models of empirical learning In arithmatic 

Neches (1981. 1982, 1987) hai; described the Heuristic Procedure Modification syAem (HPM), a 
production system architecture of learning based on the idea that significant improvements in a procedure 
can be computed by notidng patterns in the irlemal trace of the procedure, patterns that indicate some 
labour-saving transfomiation of the procedure is possible. The HPM system is based on a typology of 
strategy transfonnafons that eliminates redundancies, produces shortcuts, replaces one method with a 
computationally more efficient method, etc. (Neches L Hayes, 1978). For instance, if a procedure uses a 
partial resuH at two different points in a computation, that procedure can be improved by storing that result 
when it is first computed, and retiieving it, rather than recomputing it, when it is needed the zecor,6 time. 
In order to support the detection of the triggering patterns for these strategy transfomiations, the HPM 
architecture stores a veiy detailed trace of the execution of a procedure. For every expression that is 
written into wori<ing memory, infomiation is stored about the production rule that was responsible for the 
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creation of that expression, the conditions that led to the firing of that rule, the goal that was active when 
the rule was firod. etc. Each strategy transformation mechanism inspects this trace for the occurrence of 
the type of redundancy that it Is designed to deal with, and transforms the procedure accordingly. 

The major phenomenon explained by the HPM system is the discovery of the so-called MiN-strategy 
for simple addition. There is evidence that children who are taught to solve simple addition problems by 
combining the sets corresponding to the two addends and then counting the combined set quickly realize 
that they can proceed more efficiently by initializing their counting with the larger addend, and then 
counting only the elements in the smaller set (Groen & Resnick. 1977). The HPM application to this 
phenomenon shows how the relevant strategy transformation can be achieved through the elimination of 
redundancy (Resnick & Neches, 1984). For instance. HPM notices that in counting the combined set. the 
number con-esponding to the larger addend is generated en route to the answer. Since that number can 
always be retrieved from the problem statement, it is redundant to re-compute it. Hence, the counting can 
begin with the larger addend. The HPM system exp/aJns procedure acquisition through the application of 
content-independent mechanisms to a trace of a procedure. It does not explain the role and function of 
general knowledge in procedure acquisition. 

VanLehn {1983a. 1985a, 1985b) has described Sien-a. a procedure induction system that can 
generate a subtrac«o.i procedure from a set of solved examples. The main phenomenon explained by 
Sierra is the multitude of bugs in childrer^'s perfomianca on multi-column subtraction problems. The 
Sien-a system outputs a set of procedures in response to a sequence of solved examples. One 
explanatory principle of VanLehn's theory is that procedure induction is an Intrinsically hard problem; 
indeed. «;omi= induction problems are known to be unsolvable. As a result, a procedure induced from 
solved examples can be expected to be incomplete. Incomplete procedures may lead to :mpa$$e$, 
situations in which the procedure either cannot determine the next step, or finds that the preconditions for 
the next step are not satisfied. A second explanatory principle of VanLehn's theory is that the leamer 
deal with impasses by making local changes to his/her procedure (Brown VanLehn. 1980, 1982). He 
has identified a small set of general transformations, called repairs, that a leamer can apply to a 
procedure in order to break out of an Impasse. For instance, an impasse can be repaired by skipping the 
step that cannot be caniod out. or by replacing it by another step. If the repairs are applied to the 
procedures generated by Sienra ttie resuft is a set of buggy algorithms, f. e*, algorithms that solve 
subtraction problems, but solve them incorrectly. The Sien-a model plus the theory of repairs explain a 
significant proportion of the subtraction bugs that have been observed in the perfomiance of school 
children. Neither the procedure induction mechanism nor the repair mechanism make use of arithmetic 
principles. ' 

The theory of Kurt VanLehn and co-wori<ers is the dual of the theory produced here. They assume 
thct school children do not. In fact, consult principled knowledge of arithmetic in the construction of 
arithmetic procedures, but team inem by rote. The goal of their theory is to provide a computational 
model of rote learning, and thereby explain the actual behavior of school children. Mathematics 
educators, on the other hand, assume that school children could, in principle, team arithmetic procedures 
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in a ineaningful way. The goal of the state constraint theory is to provide a computational model of 
meaningful learning, and thereby illustrate the desired behavior of school children. Obviously, these two 
research efforts, although based on opposite hypotheses about learning, are complementary rather than 
contradictory. 

Discussion 

The computer simulation technique is applied to educationally relevant task domains with increasing 
frequency (Ohisson. 1988a). However, in spite of this fact, and in spite of the large amount of research 
devoted to the psychology and pedagogy of elementary arithmetic, only three computational models of 
the acquisition of arithmetic procedures have been proposed prior to the work reported here. The two 
process models--the strategy transforma^on model by Robert Neches and the procedure induction/repair 
model by Kurt VanLohn-both use experience-based learning techniques, and hence do not address the 
question of the role ar^a ."unction of principled knowledge in procedural learning. The action schema 
theory of counting competence does address the problem of principled knowledge, but it has not been 
embodied in a runnable system that can generate behavioral predictions. 

The learning of arithmetic procedures is a complex process that is unlikely to have a simple 
explanation. Each of the theories reviewed address a different aspect of arithmetic teaming. A complete 
model of arithmetic learning would presumably be able to plan, to detect and correct mistakes, to detect 
and eliminate redundancies, to indace procedures from examples, as well as to repair a procedure in 
order to break out of an impasse. The action schema ttieory. state constraint theory, strategy 
transformation theory, and the procedure induction ttieory. and ttie repair theory are complementary 
research efforts. 

Other research efforts have addressed the issue of the role and function of principled knowledge in 
procedure acquisition. Anderson (1982. 1983a. ,983b, 1986) have proposed the mechanism of 
proceduralization. in which a declarative principle, e. g.. a geometric theorem, is stepwise contextualized 
and converted into procedural fomi. Ohisson (1987b) has proposed a related model that specifies tnc 
conditions under which - Is meaningful to apply prcceduralization. Both of these ttieories assume that 
declarative principles occur as data-elements in worthing memory. The psychological interpretation of this 
is that principles are known explidtiy rattier ttian implicitiy. This assumption is plausible with respect to 
domains like high school geometry and Usp programming, but not witti respect to the domain of counting. 
Hagert (1986) has proposed a methodology for deriving procedures from abstract specifications which 
bears a family resemblance to the planning net analyses, but which uses ttie methodology of logic 
programming. Procedures are derive^ from abstract specHications through a deductive argument. 
Principles of the domain appear as premises in the derivation. The notion of deriving a principle from an 
abstract specification has also been investigated in software engineering (see. e. g.. Balzer. 1985). 
Fir>ally. Artificial intelligence research has invented fhe technique of explanation-based learning, in which 
principled knowJedge is used to construct an explanation why an example is an instance of a particular 
concept. By collapsing the explanation into a single mis. a general recognition rule for ttiat concept is 
created without any need to consult further examples (DeJong & Mooney. 1986; Mitchell. Keller. & Kedar- 
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General Discussion 

The firet subsection below summarizes the argument we have been making, m the following 
subsection we state the strengths cf the state constraint theory and of the HS model. Finally, we go on to 
describe the major weaknesses of our theory, and the problems they pose for future woi1<. 

Summary 

The research problem addressed in this report is the problem of the function of conceptual 
understanding In performance and learning. wHh special emphasis on arithmetic leaming. Mathematics 
educators have proposed the Conceptual Understanding Hypothesis, which claims that if children knew 
the concepts and principles of arithmetic, acquisition of computational algorithms would proceed 
smoothb/. If children understood what they are doing, this hypothesis claims, they could discover 
procedures on their own. learned procedures would be flexible, nonsensical errore would be corrected 
spontaneously. :ind learned orocedures would easily combine to form higher-order procedures. The 
major example of knowledge-based procedure aquisition in arithmetic is the domain of counting.^o 
Empirical studies have shown that children know the principles of this domain, that they can constmct 
correct and general procedures for counting without formal instmcfion. and that the learned procedures 
are flexible. The pef logical hope expressed in Conceputal Understanding Hypothesis is that if we teach 
children the conceptuaJ basis of the arithmetic procedures, then the acquisition of those procedures will 
proceed in the same insightful fashion. 

Evaluation of the Conceptual Understanding Hypoth.3sis requires explicit hypotheses about (a) what is 
meant by understanding, the content of understanding, and how that content is represented in human 
memory, and about (b) the computatlonsi machanisms by which understanding influences performance 
and procedure acquisition. The theory proposed in this report is based on the idea that understanding 
e.iables the learner to notice and correct his/her own mistakes. According to this theory understanding 
consists of principles that constrain the possible problem states. The principles can gitide perfonnance, 
because tho system tries to avoid solution paths that violate them. Furthennore. the principles can guide 
procedure acquisition, because the particular way in which a procedure violates a principle contains 
information about how that procedure should be revised. 

We implemented the theory in a production system architecture called HS. The structure of HS 
correcponds closely to the structure of heuristic search. Production mies correspond to search heuristics, 
and worthing memory correspond to the current search state. HS takes one step through the problem 
space during each cycle of operation. The major innovation of the model is the augmentation of these 
mechanisms with the state constraint representation of principled knowledge. We represent principles as 
ordered pairs of patterns, where the relevance pattern circumscribes the set of situations in which a 
principle is relevant, and the satisfaction pattern is a criterion which a situation has to satisfy in order to be 
consistent with the prindplt,. The two patterns are matched against search states with the same pattern 
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matcher that matches the conditions of production rules against working memory. The state constraints 
influence perfonnance in that (he number of constraint violations serves as a cost variable in the 
evaluation function for search states. The state constraints influence ieaming in that the HS leaminp 
algorithm reacts to a constrait t violation by replacing the faulty rule with two other rules, constrained so 
as to avoid producing similar constraint violations in future applications. 

We reported three applications of the HS system. The first two applications reproduce the major 
phenomena with respe Jo children's counting: Children can construct a con-ect and general counting 
procedure without fonnai instmction in counting, and they can adapt the procedure to changes in the task. 
The third application investigated the behavior of HS in the domain of multi-column subtraction. If HS is 
given a subtraction procedure that suffers from one or more subtraction bugs, it v^ri con-ect those bugs 
without extemal feedback, given a state constrai..t representation of the concepts and principles of 
subtraction. These three applications constitutes a substantiation of the Conceptual Understanding 
Hypothesis: a teaming system that can acquire a counting procedure in an insightful way has been 
demonstrated to be capable of Ieaming in the domain of multi-column subtraction as well. 

The HS learning algorithm is a rational Ieaming technique, because it derives a procedure from 
knowledge rather than from experience. Rational Ieaming processes have not been widely studied In 
cognitive psychology, and there are few theoretical efforts to clarify them. The analyses most relevant to 
"jur wori< are the plann' ,^ net analyses by VanLehn and Brown ^1980) and by Qreeno, Riley, and Gelman 
(1984), However, neither of these analyses attempted to provide a process model of procedure 
acquisition, and neither resulted in an implemented system. There have been other attempts to fonnulate 
teaming mechanisms that make use of a declarative representafon of domain knowlege, but they have 
not been applied to arithmetic. The process models of procedure acquisition in arithmetic that have been 
proposed are models of experience-based, rather than knowledge-based, Ieaming. The HS system goes 
beyond previous theoretical efforts in that it presents an implemented process model of knowledge-based 
procedure acquisition in arithmetic. 

Strengths of the state constraint theory 

The state constraint theory provides interesting and novel answers to several difficult questions with 
respect to the relation between un("erstanding and performance, ft also generates qualitative predictions 
which are, in principle, empirically testable. Finally, the state constraint theory fares well on other 
evaluation criteria such as generality and parsimony. 

Interpretation o'f meaningful teami ng 

What is the difference betweer! solving a problem con-ectly hut blindly, and solving that same problem 
con-ectly and with understanding? According to the state codStraint theory, there is no difference in the 
^jrocedure being executed in the two situations. A procedure is just a set of dispositions to act in certain 
ways under certain circumstances; it cannot be either blind or inteliigeni, only more or less efficient. The 
theory says that understanding is present when the procedure is executed in the context of the learner's 
world knowledge. Thoughtful execution consists of matching the outcomes of the procedural steps 
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against the concepts and principles of the relevant domain. Thoughtless execution, on the other hand, 
consists of doing the steps without reflection on their outcomes. Hence, the exhortation "think about what 
you are doingi" is slightly off-target; according to the state constraint theory, the better advice is "think 
about the results of what you are doing!". 

Wnat is the nature of knowledge? Discussions about this question usually assume that principles are 
either descriptive (e. g.. "All swans are white") or predictive (e. g.. The sun will rise tomorrow"). The state 
constraint theory claims that neither of these two models of principled knowledge is essential for 
procedure acquisition. Instead, principled knowledge consists of constraints on the possible states 0. 
affairs (e, g.. "You cannot withdraw more nroney than you have in your account bank"). Conservation 
laws in physics, e. g., the principle that energy cannot be destroyed or created, are examples of 
constraints, as are arithmetic principles, e. g.. the laws of commutativily and associativity. 

What function does knowledge have in pertomiance? What good does it do to think about the results 
of what you are doing, and how are constraint principles helpful? For every procedure there will exist 
situations in which that procedure is applicable, but in which it will not produce desirable tesults. 
Intelligent behavior therefore depends on the ability to imagine the outcomes of actions, and to weed out 
the mistaken actir <s before they are canied out. The function principled knowh-ige is to enable a 
person to catch and correct the mistakes that his/her procedure-any procedure-will unavoidably make 
when confronted with unfamiliar situations. 

This interpretation of the function of knowledge solves two technical problems that other accounts of 
the function of knowledge have been unable to deal with. The first problem concerns the effect of adding 
more knowledge to the system. Humans perfomi and learn better and faster the more they Xnow. But all 
computational mechanisms for using knowledge proposed to date suffer from combinatorial explosions: 
The more knowledge the mechanism is provided with, the slower it will wori< and the less likely it is to 
behave intelligently. For instance, the more action schemata the planning mechanism of Greeno. Riley, 
and Gelman (1984) is supplied with, the harder the planning problenP. because the more alternatives have 
to be considered at each point in the planning process. In general, mechanisms that combine knowledge 
unil« into larger structures cannot explain why people function better, the larger their knowledge base. 
However, according to our theory, state constraints are not combined with each other. Each stale 
consiraint is matched against the current search state independently of the other constraints. Hence, 
there is no combinatorial explosion as the number of knowledge items grows.^^ 

The second problem that state constraint theory deals successfully with concerns the effect of partial 
knowledge. Human beings operate very well with partial knowledge; in fact, they hardly ever operate in 
any other way. But most computational mechanis.vis for using principled knowledge cannot function if 
their knowledge base is incomplete. This is a serious problem with, for instance, explanation-based 



ERIC 



nrUli!^ compotafon required to match constraints against states grows with the numlser of knowledge items, but the 

growth need not t>o exponential, or even linear (Fotgy, 1982). 

^•••Q"®* KUL^8-03 1988 



Ohisson & Rees 



72 



Rational Learning 



learning (DeJong & Mooney, 1986; Mitchell. Keller. & Kedar-Cabelli, 1986). In general, techniques that 
build larger knowledge structures out of smaller units-an explanation, a plan, a proof, etc-cannot 
proceed if one of the units is missing. But the state constraints in our theory are not combined into 
higher-order structure's. If one of the constraints is missing, the system becomes less constrained, and it 
will therefore have to search more. But the power of the other constraints to guide performance and 
teaming is not affected. We have verified that HS can use a partial set of state constraints to guide its 
performance on subtraction problems. 

What is the nature of the change that occurs during meaningful procedure acquisition? The state 
constraint theory claims that the essence of teaming with understanding is that structure is transferred 
from declarative to procedural knowledge. yy^en a person first confronts an unfamiliar problem 
situation he/she needs to think Ciard about it, because almost every action generates constraint violations. 
As the procedure is gradually corrected, the state constraints need to kick into action tess and less often; 
execution of the procedure can he renwved from reflection and becomes more mechanical. Finally, when 
the procedure is cored, there is no need to consuli the state constraints in order Jo execute it. Hence, 
the acquisition of a procedure is a process of moving from acting under the influence of knowledge to 
"Just doing it", as common sense would have it.* 

In summary, the sxaie constraint \i\eory locates understanding in the cognitive context within which a 
procedure is executed, it a;.sumes that knowledge constrains the pos:>ibte states of affairs, and it claims 
that the function of understanding is to enable the leamer to catch and con-ecj his/her mistakes. The 
thec;y explains why the cognitive machinery does not suffer from combinatorial explo>;ion as the number 
of knowledge 'tems grows, but on the contrary becomes more efficient. It also explains why humans can 
operatk^ well with partial knowledge. These two phenomena pose major difficulties for other 
computational models of understanding. Finally, the theory explains the passage from reflection to action 
during meaningful teaming, because it claims that the teamer only consults his/her knowledge, as it were, 
when something goes wrong. 

Qualitative predictions about behavioral phenomena 

The state constraint thee makes four qualitative predictions aboui human behavior. First, the theory 
predicts that additional learning is required after the correct rules for a particular task have been 
discovered. The reason is tiiat the teaming mechanism creates rules in pairs, each member of the pair 
constraining the parent rute in a different way. When Kie correct rule is created, another, probably 
incorrect, companion rute is therefore created also. At the time of creation, it is impossibte to knew which 
of the two rules is the correct rute. HS can only identify tha correct rute by evoking the rules and observe 
their effects. HS gets rid of the incon-ect rute by constr.'=>:nin(, it until it is over-constrained, and cannot fire. 
HenCG, sdditlonal teaming trials are necessary after tne con-ect rule has been created in order to get rid of 
the superfJuous companion rule. Those teaming trials will generate ?rrors. Hence, the theory predicts that 
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errors will necessarily occur after the con-ect rule has been discovered. 

The -econd prediction of the HS system concerns the interaction between knowledge and 
performance. Since state constraints guide performance by assigning a cost to a search state that 
violates a principle, it is possible for HS to produce incorrect solutions in the presence of a complete set of 
constraints. It tums out that incorrect solution paths in the subtraction domain ere shorter than correct 
paths. Hence, if the cost of a constraint violation is less than the cost of taking an extra step. HS prefers 
the shorter path, even though it violates one or more constraints. We have verified that if HS is given an 
incomplete subtraction procedure but a complete set of principles, it produces incorrect answers on some 
subtraction problems for some settings of the cost parameters. 

The third prediction derived vrom the state constraint theory concerns the level of difficulty of learning 
a particular procedure. The theory predicts that a procedure will be easy to learn to the extent that each 
step in the procedure has results that can be judged for correctness on the basis of the principles of the 
domain. Counting is easy to learn according to this theory, because every step in counting either ''Allows 
or violates the one-one mapping pi,nciple. Mistakes are therefore immediately detectable by sou Me 
who knows the one-one mapping principle. A procedure is hard to learn to the extern that tt contains a 
large number of steps that are not on the correct solution oath, but which nevertheless are consistent with 
all the principles of that domain that the ieamer knows. In short, state constraint theory makes the 
counterintuitive prediction that the largerlhe number of cor :straints that have to be satisfied by a particular 
procedure, tf » ^asier that procedure is to acquire. 

The fourth prediction that we discovered in the simulation runs Is that the amount of cognitive effort 
required to switch from task A to task B is not the same as the cognitive effort required to switch from task 
B to task A, If HS leams to count objects in arbitrary order, it can learn to take a pre-deflnet! order into 
account in a single learning step. However, if It initially leams to count objects in a particular order, 
teaming to count objects when that order is not present requires several teaming steps. This amounts to 
a prediction that transfer between tasks will be greater in one direction than in the other. Such asymmetry 
in transfer between related tasks is intuitively plausible. 

Other evaluation criteria 

The state constraint theory Is well Integrated into current cognitive theory. The theory is an extension 
of the major hypothesis about problem solving to emerge in the past decades, namely that probtem 
solving consists of heuristic search, carried out by a production system architecture. The HS model is 
build out of off-the-shelf computational mechanisms that have already been proven fruitful In explaining a 
wide range of cognitive phenomena. Although the simulation runs ahaly.red In this leport are from the 
domain of arithmetic, the state constraint theory Is nevertheless a general theory. The computational 
mechanisms of heuristic search and of production system architectures are formulated In domaln- 
Inrlependent terms. They are not limited to arithmetic but can, in principle, be appHed to any task domain. 
The mechanism of matching state constraints against search states and counting the number of 
constraint violations Is a general mechanism, not limited to arithmetic. The mechanism for revising rules 
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in response to constraint violations is also forniulated in domain-independent terms. The state constr€.,nt 
theory postulates a simple computational mechanism. The constraints are compared to search states 
with the same pattern matcher that compares production rules to search states. Hence, no new major 
computational mechanisms had to be invented in order to augment the standard theory of problem 
solving wirfi the state constraint representation. 

Weaknesses and future directions 

The state constraint theory en-s by being incomplete. There are several aspects of procedure 
acquisition that it does not deal with, among them the role of experience, procedural errors, remote errors, 
undetected en-ors, and the hierarchical organization of cognitive skills. 

The state constraint theory as embodied in the HS simulation model does not explain the function of 
experience in the learning of procedures. While the experience-based learning models for arithmetic 
proposed by Neches (1981, 1982, 1987) and by VanLehn (1983a. 1983b, 1985a, 1985b. 1986) contain 
no mechanisms by which principled knowledge can influence procedure acquisition, the HS mode! errs in 
the opposite direction. It contains no mechanism by which procedures can be created by storing and 
generalizing steps that experience has shown give the right results. HS oniy learns by deriving 
procedures from its knowledge. But human beings obviously learn both by applyir.g their knowledge and 
by generalizing from experiences. The state constraint theory is therefore radic^Jiy incomplete. It does 
not describe how experience-based learning happens, nor how empirical and rational learning 
mechanisms collaborate in the creation of p/ocedures. 

The lack of experience-based learning mechanisms prevents HS from handling purely procedurai 
errors, i. e., en-ors that cannot be described as violations of the principles of the relevant domain. Such 
en-ors will occcur under two circumstances. First, in the case of incomplete principled kr^owledge. there 
might be en-ors that can, in principle, be described as principle violations, but which the system cannot, in 
fact, so describe, because it does not know the relevant principle. Second, in some domains there might 
be steps which are not on the conrect solution path, but which are not incon'ect in the sense of violating 
any domain principle. For instance, in mathematical proof Xas'S there are a large number of proof paths 
which are valid, but which do not lead to the target theorem. The teaming mechanism that we have 
implemented for the HS model cannot con-ect such en-ors. 

The state constraint theory is also unable to deal with remote errors. The assumption that ail errors 
violate principles of the domain implies a simple solution to the assignment of blame problem. If all errors 
violate constraints, then it is always the last rule to fire before an enror is detected that needs to be 
revised. If. however, there are enrors that do not violate constraints, then those enrors will not be detected 
at the time they are made. But they could cause constraint violations several steps later. In that case the 
faulty rule fired several steps before the step in which the enror was detected, and identifying the rule that 
is responsible for the error is a difficult problem. 

The HS model is an idealization n the sense that it does not suffer from undetected errors; it is 
guaranteed to discover every violation of its constraints. Cleariy, people often fail to detect the errors they 
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make. This phenomenon can be modeled in HS by assigning a probability to the pattem matching 
process that compares search states with constraints. The system would then make errors that it could, 
in principle, detect, but which would, in fact, go undetected on some proportion of the trials in which they 
occur. There are two reasons why we have not implemented such a mechanism in the cun-ent version of 
HS. The first reason is that the structure of the HS architecture implies that if the detection of constraint 
violations is probabilistic, so is the matching of production rules: both processes are carried out by the 
same pattem matching mechanism. Production systems with probabilistic rule matching have not been 
explored, and nothing is known about how to program them.^^ Hence, such a step is major theoretical 
move which is not Immediately related to our main objective of understanding the role and function of 
principled knowledge in procedure acquisition. The second reason is that little is gained by introducing 
quantitative parameters without independent empirical grounding at this stage in the development of the 
model. 

The state constraint theory is also incomplete in that it does not deal with the hierarchical organization 
of procedural skills. The HS learning algorithm does not create hierarchically organized procedures. As a 
consequence the state constraint theory cannot explain why understanding facilitates the combination of 
already leamed procedures into higher-order procedures, which is one of the effects hypothesized by 
adherents of the Conceptual Understanding Hypothesis. In contrast.both the planning mechanism 
proposed by Greeno and co-wort?ers ^Qreeno, Riley, and Gelman. 1984; Smith. Qreeno. & Vitolo. in 
press) deals readily wUh the hierarchical organization of cognitive skills, as does the model of procedure 
induction proposed by VanLehn (1983a). 

The weaknesses of the state constraint theory stem from its exclusive focus on en-ors that violate 
principles of the domain. Future wori< will extend the knowledge-based learning mechanism described in 
this report with one or more experience-based teaming mechanisms. There are many ways to combine 
experience-based and knowledge-based learning. For example, one possibility is to combine a planning 
mechanism like the one proposed by Greeno et. al (1984) with the state constraint mechanism. Such a 
system would ieam by constructing an inKi il procedure through planning, and then revise it in the course 
of execution if it xu^.t out to violato principles of the domain. Many other hypotheses are possible. We do 
not yet have any conclusions as to which type of combination of eiiperience-based and leaming-based 
learning mechanisms Is most likely to predict the details of human behavior. 

Future work will move from a concem with explaining qualitative features of human behavior, such as 
the ability to adapt a procedure to changes in the relevant task, to a concem wHh quantitative predictions. 
We can. in principle, derive quantitative predictions from the cuirent version of the HS modeL For 
instance, by running HS repeatedly on the task of learning to count, we can generate predictions about 



ProbabiHstte matching Is related lo, but not Idontfcal wrth. parf/a/ matching (Langley, 19S3a, p. 291). In partial matching only a 
part of a rule pattern has to match In order for a wh to ffre. In probabilistic matching a rule will only fire on some proportion of the 
cycles In which its rule pattern did match completely. 
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the frequency distnbution of error types^^* at different levels of learning. Deriving sucfi pr(?dictions would 
be premature at tfie present stage of development of tfie mode' 

One might object to the work reported here that the most radical weakness in the state constraint 
theory is that it does not explain where principled knowledge comes from in the first place. However, this 
objection represents a misunderstanding of the problem we set out to solve. We have tried to formulate a 
theory of how knowledge of principles, once acquired, can be used in the learning of procedures; we have 
not tried to explain the acquisition of principles. This way of proceeding seemingly presupposes that the 
principles of a particular task domain can be known before one knows how to act in that domain, an 
Intuitively implausible idea.^^ However, the state constraint theory does not require that a// principles' are 
known before procedural learning starts. This is an ideal case only. In a real learning situation we would 
expect the learning of principles and the leaming of procedures to be inferieaved. 

The idealization that all principles are acquired before procedural leaming starts is appropriate for the 
work reported here, because our goal was to clarify the nature of the link between understanding gnd 
procedure acquisition hypothesized in the Conceptual Understanding Hypothesis. The pedagogical hope 
expressed In that hypothesis is precisely that conceptual understanding can be the basis for procedure 
acquisition. The state vxjnstraint theory is one explanation of how access to conceptual understanding 
can enable a learner to discover arithmetic procedures, adapt procedures to changes in the task 
environment, and self-correct nonsensical errors. Future work will address the question of how principles 
are acquired. 

What are the instructional implications of t..^ state constraint theory? Suppose, for the sake of the 
discussion, that we decide to adopt the theory in its cun-ent fonn. without augmentation with additional 
leaming mechanisms. The theory then implies that a procedure cannot be taught by describing the steps 
In the procedure to the learner. There are no mechanisms in HS that can make use of an instruction like 
"first you do X. then you Y". in particular, the state constraint theory implies that it is not useful to tell a 
leamer who just committed a mistake what the con-ect action would have been. The theory implies that 
instruction should focus on the state of the problem, not on the learner's actions. In con-ecting an error 
the instructor should help the leamer to focus on the problem, and to see what is wrong with its cun-ent 
state, reminding him/her of the principles of the domain, if necessary. The instructor should not tell the 
leamer what he/she should have done to avoid the en-or. but describe which state the problem ought lo 
be in, and leave It to the leamer to figure out what action or actions would achieve that state. We are not 
proposing that mathematics teachers revise their instruction in according with these implications. We are 
not ready to 'derive specific recommendations for teaching from our theory until the theory has been 
subject to stringent empirical tests. These admittedly speculative comments are ment to Illustrate that 



r^l^H^"^ four types of errors in courrting: skipping an object, counting an object repeatedly, pipping a number, using a number 
(h^^al'^^ngS'SoTK nuSer"""'^ " ^""""""'"^ '^^'^'^ afte^ail l^ec^ have bel countoH 
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Idealized computational theories of the fonctio'.i of ' 'jndarstanding in the learning of arithmetic 

procedures can generate rather specific implicatic Ktn. 
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